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DESCRIPTIOW 

MIJTATIOMS IM THE DIABETES SUSCEPTIBILITY GFMES HEPATOCYTE NUCLEAR FACTOR (HWF) 

1 ALPHA (g). HMF Ip AMD HMF4a 

BACKGROUMD OF THE IWVENTIOM 

1. Field of the Invention 

The present invention relates generally to the fields diabetes. More particularly, it concerns the 
identification of genes responsible for diabetes for use in diagnostics and therapeutics. 

2. Description of Related Art 

Diabetes is a major cause of health difficulties in the United States. Non-insulin dependent 
diabetes mellitus (NIDDM also referred to as Type 2 diabetes) is a major public health disorder of glucose 
homeostasis affecting about 5% of the general population in the United States. The causes of the fasting 
hyperglycemia and/or glucose intolerance associated with this form of diabetes are not well understood. 

Clinically, NIDDM is a heterogeneous disorder characterized by chronic hyperglycemia leading to 
progressive micro- and macrovascular lesions in the cardiovascular, renal and visual systems as well as 
diabetic neuropathy. For these reasons, the disease may be associated with early morbidity and 
mortality. 

Subtypes of the NIDDM can be identified based at least to some degree on the time of onset of 
the symptoms. The principal type of NIDDM has on-set in mid life or later. Early-onset NIDDM or 
maturity-onset diabetes of the young (MODY) shares many features with the more common form(s) of 
NIDDM whose onset occurs in mid-life. Maturity-onset diabetes of the young (MODY) is a form of 
non-insulin dependent (Type 2) diabetes mellitus (NIDDM) that is characterized by an early age at onset, 
usually before 25 years of age, and an autosomal dominant mode of inheritance (Fajans 1989). Except 
for these features, the clinical characteristics of patients with MODY are similar to those with the more 

common late-onset form(s) of NIDDM. 

Although most forms of NIDDM do not exhibit simple Mendelian inheritance, the contribution of 
heredity to the development of NIDDM has been recognized for many years (Cammidge 1928) and the 
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-.gh deg,e. of concordance of NIDDM ,„ ™„„.y5„,i, ,b,,„„, ^, ,33,, 

genetic factors play an important role in its developmenl. 

MODV is characenzad by ,ts early age of onset wirici, ,s doring childtrood. adolescence 0, young 
adolthood and .soally before the age of 25 years. „ has a clear mode of inherrtance being aotosomaf 
dominant. Further characenstrcs include high penetrance ,of the symptomology,, and a.ailabilit, of 
n-oltigenerational pedigrees for genetic studies of NIDOM. MODV occurs worldwide and has been found 
to be a phenotypically and genetically heterogeneous disorder. 

A number of genetically distmc, forms of MOOV have been identified. Genetic studies haye shown 
..ght linkage between MODV and ONA markers on chromosome 20, this being the location of the MODV, 

gene (Bell ^/., 1991; Cox 5/ lQq?\ Mnnvo • 

a/., 19921. M0DY2 is associated with mutations in the glucokinase gene 

IGCK, located on chromosome 7 ffroguel e, ,992 and 1993). Recent linkage studies ha.e shown the 
existence of a further form of MODV whrch has been termed M0DY3 (Vaxiflaire et ./., 1 9951 M00Y3 has 
been shown to be finked to chromosome 12 and is localized ,0 a 5 cM region between markers D12S86 
and D12S807/012S820 of the cfiromosome (Menzel er^L 1995), 

Although i, is well established that M0OY2 is associated with mutations in GCK there is still no 
information as to the identity of other MODY genes. There is a clear need to identify these genes and the 
stations that result in diseased states. The identification of these genes and their produc.s will 
facilitate a better understanding of the diseased states associated with mutations in these genes and has 
important implications in the diagnosis and therapy of MODY. 

Since an understanding of the molecular basis of diabetes in general and MODY specifically may 
facilitate the deyelopmem of new therapeutic strategies fo, the treatment of these disorders, studies are 
needed ,0 identify diabetes-susceptibility genes associated with MODY. Moreoyer, methods of detectino 
indiyiduafs wt.h a propensity to deyefop such diseases are needed. Where possible, the molecular 
mechanism underpinning the genetic lesion should be determined in order to allow diagnosis and 
specifically directed therapy 

SUMMARY OF THf iMVFivTiniv 

The present inyention relates to the inventors discovery that the H/10DY3 locus the HNFla gene 
.he MODY, locus is Ihe HNF4a gene and the M0DY4 locus is HNf ,p. The invention further relates ,0 
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the discovery that analysis of mutations in the HNFla, HNFip and HNF4a genes can be diagnostic for 
diabetes. The invention also contemplates methods of treatmg diabetes in view of the fact that 
mutations in HNFla. HNF1 P and HNF4a can cause diabetes. 

m one embodiment, the mvention contemplates methods for screening for diabetes mell.tus 
5 These methods comprise: obtaining sample nucleic acid from an animal; and analyzing the nucleic ac.ds to 
detect a mutation in an HNF encoding nucleic segment; wherein a mutation in the HNF-encoding nucleic 
acid is indicative of a propensity for non-insulin dependent diabetes. 

In certain embodiments the HNF-encoding nucleic acid is an HNFla-encoding nucleic acid. In 
view of the inventor's discovery that the M0DY3 locus is HNFla, a mutation in the HNFIa-encodmg 
10 nucleic acid is indicative of a propensity for diabetes. In some presently preferred embodiments, the 
HNFla-encoding nucleic acid is located on human chromosome 12q, which is the location site of the 
MDDY3 locus. In other embodiments, the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid. 
In view of the inventor's discovery that the MODYl locus is HNF4a. a mutation in the HNF4a-encoding 
nucleic acid is indicative of a propensity for diabetes. In some presently preferred embodiments, the 
1 5 HNF4a-encoding nucleic acid is located on human chromosome 20. which is the tocation of the MODYl 
locus. 

It is important to note that the terms NIDDM. MODY, MODYl, M0DY3. and M0DY4 are used to 
designate diabetes disease states, and the use of a particular such name may not always represent the 
same causation of that disease state. The inventors have discovered that mutations in HNF4a can lead 
20 to a MODYl disease state; however, not all mutations in HNF4a that lead to diabetes might cause a 
-MODYl- disease state. Conversely, not all diabetes disease states brought about by a mutation in 
HNF4a might be considered a MODYl disease state. Therefore, Applicants prefer to use, in some cases, 
•'HNF4a-diabetes'' to note any diabetic disease state brought on by a mutation or malfunction of HNF4a, 
even those that do not exhibit all, or any, MODYl disease states. Likewise. Applicants may use ••HNF4a- 
25 diabetes- and '•HNF4p-diabetes- rather than -M0DY3" and "M0DY4-, respectively. 

The nucleic acid to be analyzed can be either RNA or DNA. The nucleic acid can be analyzed m a 
whole tissue mount, a homogenate. or. preferably, isolated from tissue to be analyzed. In some preferred 
embodiments, the step of analyzing the HNF-encoding nucleic acid comprises sequencing of the HNF- 
encoding nucleic acid to obtain a sequence, the sequence may then be compared to a native nucle.c acid 
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se,.ence of HNf to determine a rrtutafon Such a native nucle.c acd sequence of Hf«FIa may l,e,e ttte 
sequence set forth ,n SEQ ID NO: 1 Such a native nucle.c acd sequence of HNf4a has a sequence set 
forth in SEQ ID N0:78. 

The method allows for the d.agnos.s of almost any mutat.on. mcludmg. for example, pomt 
mutations, translocation mutations, deletion mutations, and insertion mutatmns. The method of analysis 
may comprise PCR, an RNase protection assay, an RFLP procedure, etc. Us,r,g this method, the inventors 
have diagnosed a variety of HNFla mutations, including those set forth in Table 8. In preferred 
embodimentsmutationsoccuratcodons 17,7,27,55/56,98, 131,122, 142, 129, 131 159 I7i 229 
241. 272, 288. 289, 291, 292. 273. 379, 401. 443. 447, 459, 487, 515, 519, 547. 548 or 62o'of an 
HMFla-encoding nucleic acid nucleic acd, for example, having the sequence of SEQ ID N0:1. In other 
preferred embodiments a mutation occurs at the splice acceptor region of intron 5 and exon 6 of an 
HNFla-encoding nucleic acid. In other embodiments a mutation occurs at the splice acceptor region of 
.ntron 9 of an HNFla-encoding nucleic acid. In other embodiments, the mutation occurs independently, in 
mtron 1, mtron 2, intron 5. intron 7 or intron 9 of HNFla gene. The inventors have also found a variety 
of HNF4a mutations, includmg those found in Table 10. In some preferred embodiments, the HNF- 
encoding nucleic acid is an HNF4a.encoding nucleic acid and a mutation occurs in exon 7 of the HNF4a- 
encoding nucleic acid. In other preferred embodiments, a mutation occurs at codon 268. 127. 130 or 154 
of an HNF4a-encoding nucleic acid having the sequence of SEQ ID N0:78. 

The invention also contemplates methods of treating diabetes in an animal comprising: 
diagnosing an animal that has diabetes and modulating HNF function in the animal. 

The step of diagnosing an animal w,th diabetes frequently comprises analysis of an HNFla- 
encodmg nucleic acid sequence or an HNF4a-encoding nucleic acid sequence for a mutation. 

The step of modulating HNF function may comprise providing an HNFla or HNF4a polypeptide to 
the animal. In cases where normal HNFla or HNF4a function is sought to be revived, the HNFla or 
HNF4a polypeptide may be a native HNFla or HNF4a polypeptide. For example, a native HNFla 
polypeptide may the sequence of SEQ ID NO: 2. A native HNF4a polypeptide may the sequence of SEQ 
ID NO: 79. The provision of an HNFla or HNF4a polypeptide is accomplished by any of a number of 
ways. For example, expression of an HNFla or HNF4a polypeptide may be induced, with the expression 
being of an HNFla or HNF4a polypeptide encoded in the animal's genome or of an HNFla or HNF4a 
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po,vp.p,ide eaooded by a ™deic acid provided to ,he ammal. The p,ov,s,on al an HNFla mfAa 
polypeptide .ay be acc„.pi,shed by a .e.bod co^prisia, in.rodaCon a. an HNFIa o, HNFAa.acodin, 
„„c,eic acd .0 ,he a„,ma,, la, example, by iniaCia, ,be HNFIa o, HNF4a-e.cod,n, aaciaic acd ,a,o ,he 
animal. 

Modolatiag HNF tano.ian in the animal .an comprise prciding a modnla.or a. HNFla or HNF4a 
,„„c,ian ,0 ,he animal. Such madula.ors are in ,be nature of drugs and can be, for example HNF4. HNF6, 
HNF3 or any other peptide ar molecule that regulates HNFfa. These modulators may be formulated ^.o 
a pharmaceutical compound for delivery to the animal. The r™.d.a,or of HNFla, Hf^Fp or HNFAa 
function may be an agonist or amagonrs, of HNFla, HNFp or HNF4a, The modulator may modulate 
transcription of an HNFla, HNFp or HNF4a-encoding nudeic acid, translation of an HNFla, HNFp or 
HNF4a.encoding nucleic acid, or the functioning al .he HNFla, HNFp or HNF4a polypeptide. 

The invention also contemplates methods .1 screening for modulators of HNF funct.an 
comprising: obtaining an HNF polypeptide, lor example an HNFla, HNFp „, HNF4a polypeptide, 
determining a standard activity of the HNF; contacting the po^peptide »ith a putative modulator; and 
assaying for a change in the standard activity of the polypeptide. In some preferred methods, the 
standard activity profile of a HNFla palypep.ide is determined by measuring the binding of the HNFla 
polypeptide to a nucleic acid segment comprising the seguance of SEQ ID NO: 9. To facilitate measunng 
the HNFla activity, ,he nucleic acid segmem comprising the sequence of SEQ ID NO: 9 or the HNFla 
polypeptide may comprise a detectable label, la some preferred methods, the standard activity prolrle of 
a HNF4a polypeptide is determined by measuring the binding af .he HNF4a polypeptide to a nucleic acd 
segment comprising .he seguence of SEtt ID NO: 85. To facili.ate measurmg the HNFAa activrty, the 
nucleic acid segment comprising .he sequence of SEQ 10 NO: 85 or .he HNF4a polypeptide may camprrse 
a detectable label. In otha, embodiments, the standard activity profile of an HNF polypeptide ts 
determined by deterrr^niag the ability of an HNFla polypeptide to stimulate transcription of a reporter 
gene the reporter gene opera.ively positioned under control of a nucleic acid segment compnsrng the 
sequence of SEQ ID NO: 1. In other embodiments, the standard activity profile of an HNF polypeptide ,s 
determined by determining the ability of an HNF4a polypeptide to stimulate transcription ol a reporter 
gene, the reporter gene operatively positioned under control ol a nucleic acid segment compnsmg the 
sequence of SEQ 10 NO: 78. Similar assays are contemplated for HNFip polypeptide. 
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The i„,emio„ also contemplates methods of screentng fo, moduletors of HNF polypeptide 
fone.ion compris,n,: „h,a,„i„g ,„ HNFIa, HNf ip o, HNF4a-encod,„8 nocle.c acd segmem, determin.nj 
a standatd t,apsc,ipt,on and translat.on act,,,ty of the HNFIa. HNf ip „, HNf4a.encoding nocle.c acd 
sequence; contacting the HNFIa or HNF4a-encodlng ndcle,c acd segment «ith a putattve moddato, 
ma,nta,n,ng tfte nucleic acid segment and pmaltve modulator under conditions that normallv allow fo, 
HNFIa 0, HNF4a transcription and translation; and assaying for a change in the transcripti.n and 
translation activity. 

The m,entors discovery allows for the preparation of a host of HNF modulators such as 
MODV3/HNFIa-modula.ors, M0Dy4,HNF,p modulators and M0DYlmNF4a modulators Such 
.modulators themselves are withtn the scope of the invention. Such an HNF modulator may be prepared or 
preparable by a process comprising screening for modulators of HNF function comprising; obtaining an 
HNF polypeptide; determining a standard activit, profile of the HNF polypeptide; contacting ,he HNF 
polypeptide with a putative modulator; and assaying for a change in the standard activity profile. An HNF 
modulator prepared by a process comprising screening for modulators of HNF function comprising 
obtaining an HNF-encoding nucleic acid segment; determining a standard transcription and translation 
activity of the HNF.nudeic acid sequence; contacting the HNF-encoding nuclerc acid segment with a 
putative modulator; maintaining the nucleic acid segment and putative modulator under conditions that 
normally allow for HNF transcription and translation; and assaying fo, a change in the transcription and 
translation activity. 

Some aspects of the invention relate to isolated and purified polynucleotides encoding an HNF 
polypeptide. Such polynucleotides can be: an HNFla-encoding nucleic acid, HNFip-encoding nucleic 
acid sequence, or an HNF4a.encoding nucle.c acid. In some particular embodiments, the polynucleotide 
encodes an HNFIa having an amino acid sequence as set forth in SEQ ID N0:127. In preferred 
embodiments, the polynucleotide may be an HNFla-encoding nucleic acid sequence has a sequence of 
SEQ ID N0:126. In additional particular embodiments, the polynucleotide encodes an HNFip having an 
amino acid sequence as set forth in SEQ ID N0:13g. In preferred embodiments, the polynucleotide may 
be an HNFip-encoding nucleic acid sequence having a sequence of SEQ ID N0:128. The polynucleotide 
may encode an HNF4a havmg an ammo acid sequence as set forth in SEQ ID NO: 140. In preferred 
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embodiments, the polynucleotide may be an HNFAa-encoding nucle.c acid sequence has a sequence of 
SEQID NO: 130. 

Other embodiments comprise isolated and purified nucleic acid segments comprising 10, 14. 15. 
25 30, 35. 40. 45. 50. 55, 60. 70. 80. 90. 100. 125. 150, 175. 200. 250. 300. 350. 400. 450, or 500 
contiguous nucleic adds identical to the sequence of SEQ ID N0:128 or SEQ ID NO: 126 or the 
complement of these sequences. These nucleic ac.d segments can be used by those of skill in the art as 
hybridization probes, PCR primers, for the expression of HNF polypeptides, for the expression of other 
polypeptides, etc. In some embodiments, the segment encodes a full-length HNF polypeptide. Of 
particular interest are the promoters for HNFla and HNFl (3, which are disclosed in SEQ ID NOS: 126 and 
128 respectively and in FIGs. 26 and 27. respectively and discussed elsewhere in this application. These 
promoters may be used by those of skill in the art in many varying applications. 

RRIFF DESCRIPT '"" "P THF DRAWINGS 

The following drawings form part of the present specification and are included to further 
demonstrate certain aspects of the present invention. The invention may be better understood by 
reference to one or more of these drawings in combination with the detailed description of specific 

embodiments presented herein. 

FIG. 1. Pedigrees of M0DY3 families. The individuals studied in the Clinical Research Center at 
the University of Chicago are indicated by MD-1-5 and 8-13 and those with NlDDf^, IGT and NGT are 
shown by black symbols, shaded symbols and open symbols, respectively. The asterisks indicate that 
these individuals have inherited the at-risk haplotype associated with M0DY3 in that family. The 
genotypes and haplotypes for the P family have been described (Menzel et aL 1995) and the pairwise lod 
score between MODY and the D12S76I012S321 haplotype in this family is 2.06 at a recombination 
fraction of 0.00. The pairwise lod score between MODY and D12S76 in pedigree F549 is 0.65 at a 
recombination fraction of 0.00 (Vaxillaire et aUmi The pedigrees BDA1 and BDA12 have not been 
previously described. MODY co-segregates with markers tightly linked to M0DY3 in these families with 
pairwise lod scores between MODY and D12S86 of 1.94 and 0.60. respectively, at a recombination 
fraction of 0.00. 



BNSDOCID <WO 9ei1254A1_l_> 



15 



20 



25 



wo 98/11254 

PCT/US97/I6037 

8 

FIG. 2. Average glucose (A), msulm (B) and ,nsul,n secretion rate (ISR) (C) profiles m 7 d.abet.c 
M0DY3 sub,ects (□,, 6 nondiabet,c M0DY3 s.b.ects M and 6 control subjects (o,. dur.ng the stepped 
glucose .nfus.on stud.es. After a 30 min penod of basel.ne sampl.ng, glucose was .nfused at rates of 1 
2. 3, 4, 6, and 8 mg -kg ' -mm Each infus.on rate was administered for a per.od of 40 mm and glucose, 
insulm and C peptide were measured at 10. 20. 30 and 40 mm into each period. 

FIG. 3. Relationship between average plasma glucose concentrations and ISR's durmg the 

stepped glucose mfus.on studies in 7 diabet.c M0DY3 subjects (□). 6 nondiabet.c M0DY3 subjects (.) 

and 6 control subjects |o). The lowest glucose levels and ISR's were measured under basal conditions 

and subsequent levels were obtained during glucose infus.on rates of 1. 2. 3, 4. 6 and 8 mg kg ' -min 

respectively. 

FIG. 4. Graded i„.,a,e„ous glucose Infusions were administered to 6 controls (Al, 6 nond.abetic 
MODra s„6,ec,s (B, end 7 diebet.c MODYS subjects (C. eftet an o,e,n„h, fast fbasel,ne and after a 
42h intravenous infusion of glucose Ipostglucose iO]) al a rate of 4 6 mg kg 'mm'. 

fIG. 5A, f ,G. 5B. FIG. 5C. fIG. 5D, FIG. 5E, FIG. 5F and FIG 5G M0OY3 pedigrees shoeing 
co.segregat,on of mutant HNFlc allele wi.b d.abetes mellitus. Males are noted by square symbols and 
females by ccles. Individuals witb NIODM are noted by black symbols and those w,.h gastational-onse, 
diabetes or impaired glucose tolerance by shaded symbols. A diagonal line through the symbol indicates 
that the individual is deceased. 

The i„di.idual ID is noted a, the top right corner of each symbol and the HNf la genotype if 
determined, noted below; N, normal allele; M, mutant allele. The arrow indicates the individual from each 
pedigree who was screened for mutations. Note that some individuals ha.e inherited the mutam allele but 
do not yet have NIDOM, usually because of their young age (e.g. P pedigree, individual IV-B^ and Ber 
pedigree, individual V.2. Also, some individuals have NIDOM even though they did not inherit the mutant 
HNFIa allele segregating in ,ha, family (e.g. Be, pedigree, individual 11-21. Such heterogeneity has been 
noted previously (Bell 1991) and is a reflection of the high prevalence of NIDDM. 
FIG. 6. Tlie involvement of hepatocyte nuclear factors in diabetes. 

FIG. 7. An alignment of the HNF4a protein seguence from humans Ihl with sequences from 
human, mouse Im) , Xen.pus 1x1 and Orosophila Idl species. The putative DNA binding sites are underlmed 
and the putative ligand binding sites are in bold. 
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FIG 8A. FIG. 8B. FIG. 8C. FIG. BD FIG. 8E. FIG. 8F. FIG. 8G, FIG. 81, FIG. BH. FIG, 81. The 

DNA sequences tor exon 1. e«n 2, e«on 3, e«on 4, e,.n 5 e,en 6 exon 7 exo„ 8 exon 9 and exon 10 ol 
HNF4a, 

FIG, 9, Physical map of Ihe M0DY3 region of chromosome 1 2. YAC, BAG Ibl and PAC Ipl clones are 
5 represented as lines, the lenath o. which reflects the number of included STSs and no. the actual size. The 
physical distance between adjacent STSs has not been deterntined directly and STSs for which the order 
has not been unambiguously determined are indicated in brackets. A circle indicates that the clone was 
positive fo> the indicated STS and a square indicates a STS derived from the end of that specific done. 
Several YACs contain large internal deletions which are noted by brackets. The STSs are from GDB'" and 

10 the GenBank STS databases. 

FIG. 10. Partialsequenceof exon4of theHNF lageneof individualEAl (Edinburgh pedigree). The 

sequences of the normal and mutant alleles are shown. There is an insertion of a C .n codon 291 (noted by 
the arrowhead) m the mutant allele resulting in a f rameshift and premature termination. 
FIG. 11. The cDN A sequence of HNF la denoting position of the exons. 
FIG. 12. Model of the human HNF-4a showing the different patterns of alternative splicing and 
structures of the different forms of HNF-4a that can be generated by altemat.ve splicing. The amino 
acids that define the boundaries of some of the regions of the protein are shown, DBD and LBD 
correspond to the DNA and ligand-binding domains of HNF-4a, respectively. 

FIG. 13. Comparison of the sequences of the promoter regions of the human and mouse HNF-4a 
20 genes (SEQ ID N0:135 and SEQ ID NO:137, respectively). Identical residues are shown in boxes. The 
binding sites for transcription factors that may regulate the expression of HNF-4a are overlined. The 
asterisk notes the predicted transcriptional start site based on the study of the mouse HNF.4a gene 
(Zhong Bt al.. 1 994). The minimal promoter region required for high-level expression of the mouse gene in 
hepatoma cells is shown by shading. The ATG codon which defines the start of translation is noted. The 
25 arrowhead shows the DNA polymorphism found in the promoter region of the proband cf family J2-96. 
The GenBank accession nos. for the mouse promoter sequence are S74519 and S77762. 

FIG. 14A and FIG. 14B. Partial sequence of exon 4 of HNF4a gene of pattent J2-21. The 
sequences of the normal (FIG. 14A SEQ ID ND:141 and corresponding amino acids SEQ ID N0:142) and 
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mutant (FIG. 14B; SEQ ID N0:143) alleles are shown and the arrow md.cates the C-.T substitution at 
codon 127. 

FIG. 15. Pedigrees of Japanese families with mutations/polymorphisms m the HNF-4a gene. 
Individuals with diabetes are noted by filled symbols and nondiabet.c (or not tested) individuals are 
indicated by open symbols. The arrow indicates the proband. The cimical features of each subject are 
shown including age at diagnosis, present age and present treatment. The HNF4a genotype of tested 
individuals is noted N normal and M mutation/polymorphism. 

FIG. 16. Identification of a nonsense mutation in the HNF4a gene in a german family, the 
Dresden-n pedigree. The members of this family with MODY and impaired glucose tolerance are 
indicated with black and shaded symbols, respectively. The age at diagnosis of diabetes mellitus. present 
age and therapy (OHA, oral hypoglycemic agents), and nature of complications IM, macrovascular disease; 
R, retmopathy; and N, peripheral polyneuropathy) are indicated. The haplotype associated with MODY ,n 
this family is shown. 

FIG. 17. Partial sequence of axon 4 of the HNF4a gene of subject 11-4 of the Dresden ! 1 
pedigree. The R154X mutation is indicated (SEQ ID N0:144 and SEQ ID N0:145). Intron 4 follows the 
Gin codon, CAG. 

FIG. 18A. FIG. 18B. FIG. 18C and FIG. 18D. Oral glucose tolerance testing m the Dresden-1 1 
family. The blood glucose (PIG. 18A), insulin (FIG. IBB), C-peptide (FIG. 18C) and proinsulin (FIG. 18D| 
levels during the course of the glucose tolerance test are shown. The open symbols are the means±SEM 
for subjects with the R154X mutation, including those with diabetes and impaired glucose tolerance, and 
the filled symbols are the means for the two normal subjects. 

FIG. 19A. FIG. 19B. FIG. 19C and FIG. 19D. Effect of bolus and infusion of arginine, of 
glucose, and of arginine during hyperglycemic clamp on plasma concentration of glucose (FIG. 19A), 
insulin (FIG. 198), C-peptide (FIG. 19C), and glucagon (FIG. 19D) in 3 groups of subjects of the RW 
25 pedigree. 

FIG. 20A and FIG. 20B. Acute insulin (FIG. 20A) and C-peptide (FIG. 20B) response to bolus 
administration of arginine in 3 groups of sub,ects of the RW pedigree at baseline and during the 
hyperglycemic clamp procedure. The slope of the line connecting these insulin responses (slope of 
potentiation) was lower mND[*) vs. NDN, p < 0.001. The slope for D( + ] was lowest 



15 



20 



^NSDOCID- <WO 9en254A1_i_> 



10 



15 



20 



25 



PCT/US97/16037 

WO 98/11254 

11 

FIG 21 MODY pedigree, Italy 1- Subjects with MODY and impaired glucose tolerance are 
ind.cated by filled and cross-hatched syr^bols. respect.vely. Nondiabet.c sub.ects (by testir^g or h.story) 
are indicated by open symbols. The clinical features of the sub.ects are noted below the symbol .ncludmg 
current treatment: insul.n or oral hypoglycem.c agent (OHA). The haplotype at the markers D12S321- 
D12S76.UC.39 is shown and the at-risk haplotype is noted by shading. The HNF-la genotype .s shown: 
N normal; M. mutant (A->C substitution at nucleotide -58). Although treated insulin, subject 111-9 fast.ng 
C-peptide value of 1.2 ngimi indicating that she has MODY rather than insulin-dependent d.abetes 
mellitus. 

FIG 22. comparison of the sofluonco of Iho promote, roBioo of the tr-mao, rat, mouse, chrcken 
a„d fro, HNF la a .erres (SEa ID N0:134; SEQ 10 ND:,38; SEQ ,D N0:,36; SEQ ID N0:,32; SEQ ID 
N0133 respec,i.olvl. The A-,C suhstitution a, .ucloa.rde 68 and HNF4a bindir, site are shown. 
Residues idemroal to the human senuanoe are boxed. NuCeotidas are numbered rela.,.e to the 
transcriptional start s«e of the human ,ene (indicated by an asterisk). The ho.ed ATG triple, is the 
Wtiating ma.hion.ne. The dashes indicate gaps introduced in the se,uences to generate this ahgnmant. 

FIG. 23. Summary of mutations in the human HMF-la gene. This cartoon shows the axons and 
promoter region as boxes The mutations and amino acid polymorphisms are from Yamagata e, a/., 1996; 
Leht.M«,/., 1997; KaisakiPJ,«./., 1997; VaxillaireW./., 1997: Frayling«,/.,1997:HansenT.«^., 

,997; Urhammer « a/., 1997; Glucksmann « sL 1997. The amino acid polymorphisms are 1IL27, AIV98 
and S/N487 The singlelettar abbreviations for the amino acids are used. 

F,G. 24 Partial sequence of axon 2 of HNF ip gene of subiect J2.20 (SEO 10 Na;146 and SEQ 
ID N0:147). The C-»T mutation in codon 177 is indicated. 

FIG 25 J2.2D pedigree. Individuals with diabetes mellitus are noted by filled symbols. The 
,„ow indicates the proband. The present age, age a. diagnosis, currant treatmem and complications .re 
shown. The HNF-ip genotype is noted; N, nomtal; M, mutant. OHA, oral hypoglycemic agent; PDR, 
proliferative diabetrc retinopathy; CRF, chronic renal failure; and OKA, diabetic ketoacidosrs. 

FIG. 26A-F1G. 26M Partial sequence of human HNFla gene. SEQ ID N0;126 and SEQ ID 
NO-127 These figures depict a contiguous sequence and have been split into panels due to the sue of t.» 
sequence. The nucleotide and predicted amino add sequences are shown. Exon and in.ron sequences are 
in uppercase and lower cases respectively. The approximate s.ze of the gaps in the introns, the complete 
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sequence o, «h,ch was no, .e,e™,„e. ar. ,„ „e p,„™,e, ,e,,, p„„„„3, ^„^^ 

.'ansc„p,lo„ ,ac,o,s ,.a, ™y .g^.e «p,e.,o„ ., ,h,s gane are ,„.ca,ed, .„h s„es ,d.„„fie. 
Dnase ,„„,pr,„„„, ,„ „a„„, ,<,e„,i,ied by se,.ence ho™i.gv ,n norma, .yp. The .,n,.a, 
proncer. rej.on ,s shown ,„ toldface .ype. The polymorp,,., an. n,„,a„o„s ,„ ,he HNF,. gene 
.den>,he. ,o .a,e are sha„„ i„ boldface type »„h ,he des,B„a,io„ o, ,he mutation no.ad. The asterisk 
notes the predrc.ed „anscrip„o„a, start site based on studies „ rat HNFIa gene^ The letter n ,„d,cates 
that the setiuence was ambiguous at this site 

FIG. 27A-FIG. 271 Partial sequence of human HNFlp gene, SEQ ID N0:I28, SEQ ID M0129 
and SEQ ID N0:,39 These hgures depict a contiguous seguenoe and have bean sp,„ ,n,o panels due to 
the 0, the sequence. The nucleotide and predicted ammo acd saguences are shown. Exon and intren 
seguences are ,n uppercase and lower cases respectrvely. The appro„ma,e s.a of the gaps ,„ the 
rntrons. the complete seguence of whrch was not determined are noted. In the promoter regron, p.tentra, 
brndrn, s.tas for transcr.ption factors that may re.ulate e.press.on of this gene are ind.cated, w„h s„es 
rdentrfred b, Dnass footpnnting in italics, those identified by seguence homology ,„ normal type. 

FIG. 2BA.FIG. 28V Partial seguence of human HNF4a gene. SEQ ID N0:130 SEQ ID NO 131 
and SEQ ,D N0:,4Q These depict a contrguous seguence and ha.e been split into panels due to the si. 
of the sequence. The nucleotide and predicted ammo acd seguences are shown. Exon and ,„,ron 
sequences ate in uppercase and lower cases respectiyely. 

DESCRIPTlnitl m It I rrcTByiTiVC FMUnniMHiTc 

The present myention concerns the early deteCon, diagnosis, prognosis and treatment of diabetes 
The present inyention describes for the first time mutations responsible fo, HNFIa, HNFlp and HNF4a- 
relatad diabetes. The specific mutation and identity o, the corresponding wild-type genes „om drabet. 
subtects, are disclosed. These mutationsare indicators of HNF,„, HNFlp and HNF4a related diabetes and 
a. diagnostic o, the potential for the development o, diabetes. „ ,s envisioned that the ,ech„„uas 
dtsclosed he,e,n will also he used to identify other gene mutations lesponsiblefor other forms of diabetes 

Those skilled in the ,r, will realize that the nucle.c end saguences disclosed wril fmd utility i„ a 
varrety of applications in diabetes detection, diagnosis, prognosis and treatment. Examples ol such 
applications wrthin the scope of the present inyention include amplification of markers o, MDV using 
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specific primers; detection of markers of HNFIa, HNFip arid HNF4a by hybridization with oligonucleotide 
probes; incorporation of isolated nucleic acids into vectors and expression of vector-incorporated nucleic 
acids as RNA and protein; development of immunologic reagents corresponding to gene encoded products; 
and therapeutic treatment for the identified MODY using these reagents as well as, anti sense nucleic acids, 
5 or other inhibitors specific for the identified MODY. The present invention further discloses screening assays 
for compounds to upregulate gene expression or to combat the effects of the mutant HNFla, HNFip and 
HNF4a genes. 

A. DIABETES AND MODY 

Diabetes mellitus affects approximately 5% of the population of the United States and over 100 
10 million people worldwide (King etal.. 1988, Harris et aL 1992). A better way of identifying the populace 
who are at risk of developing diabetes is needed as a subject may have normal plasma glucose 
compositions but may be at risk of developing overt diabetes. These issues could be resolved if it were 
possible to diagnose susceptible people before the onset of overt diabetes. This is presently not possible 
with subjects having classical diabetes due to its multifactorial nature. 
1 5 MODY is a monogenic form of diabetes and thus the genes responsible can be more easily studied 

than those whose mutation contributes to the development of polygenic form(s) of this disorder such as 
type 1 and type 2 diabetes mellitus. Recent studies have shown that subjects with maturity onset 
diabetes of the young (MODYl, a subset of diabetes characterized by diabetes in the first or second 
decade of life and autosomal dominant inheritance have shown that MODY may result from mutations in 
20 genes on chromosome 20 (HNF4a/M0DY1), chromosome 7 |glucokinase/M0DY2) chromosome 12 
|HNFlalM0DY3) and chromosoem 17 (HNFIp/MODYA). 

The clinical characteristics that manifest in HNF4a, HNFla and HNFip type diabetes resemble 
those seen in patients with type 2 diabetes. These characteristics include frequent severe fasting 
hyperglycemia, the need for oral hypoglycemic agents, eventual insulin requirements, and vascular and 
25 neuropathic complications (Fajans et a!.. 1 994; Menzel et aL. 1 9951. 

The inventors have shown that prediabetic subjects with mutations in the HNFla and HNF4a 
genes have subtle but important alterations in the normal pattern o1 glucose-stimulated insulin secretion. 
Compared to control subjects with no family history of diabetes, they had normal insulin secretion rates 
at lower glucose concentrations. However the increase in insulin secretion rate resulting from an increase 
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in the plasma glucose concentration above 8 mM was less .n prediabetic HNFla mutation subiects than 
controls (see FIG. 2- FIG. 4). 

Exposure of the normal p-cell to increased plasma glucose concentrations for 42 hours results .n 
an increase in p-cell responsiveness to a subsequent glucose stimulus. Following a 42-hr glucose infusion 
which raised the plasma glucose concentration to an average value of 7.1 ± 1.4 mM. the insulin secretion 
rate of prediabetic HNFl a-mutation subjects increased by 35% between 5-9 mM glucose with a resultant 
shift in the dose-response curve to the left. Five out of six prediabetic HNFla-mutation subjects showed 
this increase in insulin secretion rate, and only one subject MD13 failed to demonstrate this effect. The 
magnitude of this priming effect of glucose was similar to that seen in the controls. 

Diabetic HNFla mutation subjects demonstrated diminished insulin secretion across the entire 
range of glucose concentrations studied. Thus, over the concentration range between 5 and 9 mM 
glucose, the diabetic subjects secreted 50% less insulin than the controls and 51% less than the 
prediabetic HNFlamutation subjects. Furthermore, the priming effect of glucose was lost in the 
subjects with overt diabetes. 

Evaluation of insulin resistance indicated that HNFIa-mutat.on subjects were no more resistant 
than the controls. In fact, there was a tendency towards a lesser degree of msulm resistance in the 
HNFla-mutation subjects, making it highly unlikely that insulin resistance plays a primary role in the 
pathophysiology of diabetes in these subjects. 

The inventors have recently characterized insulin secretory responses in prediabetic HNF4a and 
HNFla-mutation subjects. Prediabetic HNF4a and HNFla-mutation subjects both have reduced insulin 
secretory responses to glucose which are evident only as the plasma glucose rises above a threshold of 7 
or 8 mM. respectively. Whereas in HNFla-mutation subjects the priming effect of glucose on insulin 
secretion is retained, a low-dose glucose infusion did not have any significant effects on insulin secretion 
in prediabetic HNF4a.mutation subjects (Byrne et al.. 1995b|. In subjects with mutations in the 
glucokinase gene, the dose-response curve is shifted to the right and ISR is markedly decreased at 
glucose concentrations below 7 mM, but insulin secretion continues to increase with increasing plasma 
glucose concentrations even above levels of 8 mM. The priming effect of glucose on insulin secretion also 
is preserved (Byrne et al.. 1994). The inventors have recently performed similar stud.es m subjects with 
classical Type 2 and impaired glucose tolerance. In subjects with IGT, although the dose-response curve 
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relatmg glucose and insuLn secret.on was shifted to the right, the pnm.ng effect of glucose on insul.n 
secretion was retained. In sufa,ects with overt Type 2 d.abetes. the increase m .nsulin secretion m 
response to an increase in glucose was markedly reduced and the pnming effect of glucose on insulin 
secretion was lost. 

It thus appears that p-cell dysfunction plays an important, pathophysiologic role in the 
development of the three fom^s of MODY which have been characterized to date. A clear prediabetic 
phase has not been identified in subjects with glucokinase mutations. However, profound defects in the 
ability of the P-cell to respond to a glucose stimulus is present even in the face of the mild elevations in 
glucose which characterizes the majority of these subjects. By contrast, a prediabetic phase is a feature 
0 of the HNF4a and HNFla fomis of diabetes. These prediabetic subjects have reduced insulin secretory 
responses to elevated concemrations of glucose mduced by the step-wise glucose infusion prior to onset 
of diabetes. Prediabetic HNF4a and HfJFla subjects can be distinguished based on the effects of a low 
dose glucose infusion on insulin secretion. The priming effect of glucose on insulin secretion is retained m 
HNFla subjects in the prediabetic phase but is lost after the onset of overt hyperglycemia whereas this 
,5 priming effect is absent in HNF4a diabetes even in the prediabetic phase of the disease. The severe 
reductions .n insulin secretory responses to glucose seen m the overtly diabetic HNFla subjects are likely 
to be due m part to the effects of high glucose, in view of the well documented adverse effects of 
hyperglycemia on insulin secretion. A full understanding of the reasons for these changes in the dose- 
response relationships between glucose and insulin secretion requires a better understanding of the roles 
20 of HNF4a and HNFla in regulating normal pancreatic b cell function. 

Further studies by the inventors have shown that elevations in the 2-hr post-challenge blood 
glucose levels predict alterations in insulin secretory responses to glucose. However, in that case, 
subjects with impaired glucose tolerance demonstrated reduced insulin secretory responses over a range 
of glucose concentrations and not just m response to increases in glucose above 8 mM as was seen .n the 
25 prediabetic HNFla-mutation subjects. Thus, the inventors do not believe that the alterations in insulin 
secretion seen in the prediabetic HNFla subjects resulted from the modest elevations in glucose. Rather, 
the inventors' results suggest that the percent priming and overall insulin secretion rates deteriorate as 
glucose tolerance deteriorates, and the lack of ability to increase msulin secretion at high glucose levels is 
a feature of the mutation in the HNFla gene. 
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From the stud.es described above and ,n the Examples that follow -t .s clear that the ,dent,f.cat,on 
and characteruatron of the gene(s) associated w.th MODY diabetes .s .mportant. Mutations ,n such 
genes lead to diabetes and ,t .ould be diagnostically and therapeutically advantageous to identify the 
mutations in subjects predisposed to such mutations. 

Studies attempting to f.nd the location of the M0DY3 gene showed that the putative gene linked 
to MDDY3 type diabetes was locaii.ed to a 5 cM interval between the markers D12S86 and 
D12S807/D12S820 ,Men:el . ... 1995,. However the identity of the gene has not been elucidated 
The present invention for the first time shows that the gene linked to M0DY3 expresses a factor 
previously identified from hepatocyte known as hepatocyte nuclear factor 1 a herein referred to as 
HNFIa. 

S^ilarlv studio ,„emp„„g ,„ fino ,oc3,i.n ol MOOY, sena st,owed ,ha, ,he pu«„.e 
gene linked ,o MODY, ,ype diabetes was idealised ,„ a ,3 M ,n,er,al between ,he ™,kars D20SI69 
and D20S, 76 K.offei « ./., ,996). Likewise, as w„h M00Y3. ,^e ,dan,i,v of ,he ,ene ,n MOOY, has 
nn. teen eldcida.ed. The presen, i„,e„„„„ ,he fe, shows ,ha, ,he gene linked ,o MODY, 
expresses a factor previously identified from hepatocytes known as hepatocyte nuclear factor 4 a hereto 
referred to as HNF4a. 

Subseauentlv, the inventors performed studies to elucidate the genetrc defects responsible fo, 
other fo^s of MODY. The present invention for the first time shows that MODY is «kely a consequence 
of mutations in hepatocyte nuclear factor 1(3 herein referred to as HNF,p. 

The association of mutation ,n HNF,a, HNFlp and HNF4a with diabetes indicates the 
■mportance of the HNF network in controlling pancreatic p.cell function and glucose homeostasis. Hence 
.he studies presented here have categorized exemplary mutations ,n the HNF ,a, HMFip and HNF4r. genes 
as identified by PCR technigues. These landmark results form the basis of many therapeutic and 
diagnostic technigues as measures to alleviate diabetes, particularly HNF Icdiabetes, HNF ip.diabeles 
and HNF 4a diabetes. 

B. HEPATOCYTE NUCLEAR FACTORS ARE THE GENES LINKED TO MODY TYPE DIABETES 
Hepatocyte Nuclear Factor fa i i t uiabetes. 

Hepatic nuclear factor la (also known as APF, LFBl or HP1, has been described as a sequence 
specific DNA binding protein from rat liver. It is thought to interact with promoter elements present ,n 
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many genes including albumin, a- and p- fibrinogen, a-1 -antitrypsin, a fetoprotein pyruvate kinase, 
transthyretin and aldose B among others. HNFla has been purified from rat liver extracts by DNA 
affinity chromatography using fibrinogen promoter element ICourtoise, 1987) and was characterized as a 
single 88 kDa protein. It is now/ known that HNFla is a transcription factor. 

5 Mendel and Crabtree (1993) suggested that HNFla interacted with "hepatocyte specific" genes 

in which it plays a prominent role in regulation of both in vitro and in vivo transcription. However, it was 
later shown that HNFla mRNA can also be found in several non hepatocyte tissues including the kidney 
stomach, intestines, thymus and spleen and pancreas (Baumhueter et a!.. 1990: Kuo et a/.. 1990). This 
suggests that HNFla expression may participate in the differentiation of non-hepatic organs as well as 

10 hepatogenesis. 

Transcription factors are proteins that control transcription by binding to cis acting regulatory 
DNA sequences m a gene. As such, these factors play a crucial role in development and differentiation by 
dictating the pattern of expression of genes within specific cells and tissues. 

The homeodomain proteins are a class of transcription factors. These proteins all possess the 
15 unusual characteristic of having very similar DNA-binding domains even though they mediate diverse 
effects. HNFla is an example of a homeodomain protein. HNFla has been shown to dimerize with itself 
in solution. It appears that maximal transcriptional activation by HNFla requires a novel dimerization 
cofactor. This cofactor, known as the dimerization cofactor of HNFla (DCoH), does not in itself bind 

DNA, rather, it binds HNFla. 
20 HNFla binds to DNA as a dimer; this was confirmed from studies on the purification and cloning 

of HNFla. Other studies showed that there was a DNA binding protein that binds to the HNFla binding 
site in cells that lacks the HNFla mRNA. This second protein HNF1 p is a homolog of HNFla but is the 

product of a separate gene. 

Regulation studies of the HNFla promoter showed that binding sites for transcription factors 
25 HNF3, API and HNF4a are essential for the expression of HNFla (Hansen and Crabtree, 1993). It has 
been demonstrated that HNF4a is located on chromosome 20 of the human genome. The present 
inventors suggest that MODYl, which is known to be linked to chromosome 20, may act as a regulator of 
M0DY3 gene expression as such mutations in HNF4a may be responsible for MODYl form of diabetes. 
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HNFla proteins possess three funct.onal regions, namely, the dimenzat.on. activation and DNA 
binding domains. The dimenzat.on domain is localized to the first 32 ammo acds of tne HNFla proteins. 
The DNA-binding domain ,s a POU-l.ke homeodomain which binds to a 13 bp palindromic DNA sequence in 
the promoters of Hf^Fla binding proteins (Courtois at ai, 1988; Frain et al., 19891. The consensus 
sequence for this HNFla bindmg site on these genes is: 

GTTAATNATTACC (SEQ ID N0:9) 
Diabetes mellitus alters the transcription of numerous genes in many different tissues. The 
mechanisms underlying these alterations in transcription are largely unknown. One example of altered 
transcnption is seen in the reduced transcription of the albumin gene in diabetes (Wanke et aL 1991) 
Recently. ,t has been dernonstrated that HNFla protein levels are reduced in diabetes, leading to the 
theory that decreased gene transcription in diabetes is due to decreased levels of HNFla a factor critical 
for the regulation of hepatic albumin gene expression. This is thought to be the case in other genes that 
posses an HNFla bindmg site and are affected by diabetes. Therefore changes in the abundance of 
HNFla in diabetes appears to affect the expression of genes whose expression is predominantly 
regulated bv this factor. 

The expression of the insulin gene in adult mantmals is localized to the p cells ,n the pancreatic 
.slets. S,ud,es of this gene ha,e defined a srtall region in the promoter, the Ff -minienhancer, capable of 
conferring tissue-specific and glucose responsive transcriptional activity on a heterologous promoter 
(Benpan 1990). This ntinienhance, re^on ,s composed of t»,o primary regulatory elements the Fa, 
box and the FLAT element which interact to upregulate transcription. 

Further analysis of the FLAT element showed it to be a cluster of several cis loci that mediate 
discrete positive and negative effects. The positive locus is characterized as fLAT-F and ,ts activity is 
only revealed when there is a mutation in the negative locus FLAT-E. This FLAT-f region is able to 
specifically bind a number of DNA-binding proteins. The se,uence of FLAT-F has significant similarity ,. 
the consensus se,uence of HNFla. This led to studies to determine whether HNFlcr itself may play a 
role in the transcriptional regulation of the rat insulin gene. Subsepuently, ,t was shown that HNFla 
expression is present in the pancreatic P-cell derived insulinoma cell line HIT. HNFla has been shown to 
bind with and transactivate tat insulin gene enhancers that contain an HNFla site. 
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Hepatocyte Nuclear Factor 4a 

Hepatocyte nuclear factor 4a (HNF4a) is another transcnpt.on factor first associated with the 
liver and having limited tissue distribution (Xanthopoulos et al.. 1991: Zhong et sL 1994). HNF4a can 
activate transcription in several non-hepatic celt lines, indicating that no liver-specific modification is 
required for its function (Sladek etaL, 1990). 

It has been observed that there is an apparent contradiction between the molecular mass of 
HNF4a predicted from the primary sequence (5D.B kDa) (Sladek et al.. 19901 and that detemiined by gel 
electrophoresis 154 kDa) suggesting that this difference may be due to post translational modification(s). 
Of the many types of post translational modifications that might regulate gene expression, most attention 
has been focused on phosphorylation, which can influence transcription factor activity in many ways 

(Hunter and Karin, 1992). 

Three main levels of regulation have been described: phosphorylation can affect the DNA-binding 
activity IBoyleer^/., 1991: Segilefa/.. 1991; Shuai a/.. 1994), the transcriptional activation potential 
(Yamamoto etaL 1988; Trautwein et al.. 1993), or the translocation of a transcription factor from the 
cytoplasm into the nucleus (f^eU and Ziff, 1991: Kerr et al.. 1991; Schindler et al.. 1992; Shuai et al.. 
19921. These possibilities are by no means mutually exclusive, and in principle phosphorylation can be 
responsible for simultaneous regulation at several distinct levels. With the exception of certain signal 
transduction proteins (Darnell et al.. 1994), all examples of this type of regulation have involved 
phosphorylation at serine or threonine residues. 

It has been demonstrated that the activity of HNF4a is post translationally regulated by tyrosine 
phosphorylation, providing an example of a non-signal transduction factor modulated by this modification. 
The HNF4a polypeptide (SEQ ID N0:79) contains 12 tyrosine residues scattered throughout the DNA- 
binding, dimenzation, and putative ligand-binding domains (Sladek et al.. 1990) which could be potential 
phosphorylation sites. It seems that the tyrosine phosphorylation of HNF4a is required for its DNA- 
binding actwity. It has been shown that the transcriptionally active form of HNF4a is localized in specific 
subnuclear domains. This intranuclear distribution depends directly or indirectly on tyrosine 
phosphorylation, suggesting the existence of an additional control mechanism at the level of subnuclear 
targeting playing a role in transcription regulation. 
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Hepatocyte nuclear factor 4a (HNF4a) is a positive acting transcription factor whicfi is 
expressed very early m embryo development and is essential to liver development and function (reviewed 
in Sladek, 1993 and Sladek, 1994). Mouse HI\IF4a mRNA appears in the primary endoderm of implanting 
blastocysts at embryonic day 4.5 and in the liver and gut primordia at day 8.5 (Duncan et al.. 1994). 
while mice deficient in HNF4a do not survive past day 9 postcoitus (Chen et al., 1994). 

HNF4« has also been proposed to be responsible for the final commitment for cells to 
differentiate into hepatocytes (Nagy et al.. 1994). In adult rodents. HNF4a is located primarily in the 
liver, kidney, and intestine, and in insects HNF4a is found m the equivalent tissues (Sladek et al., 1990; 
Zhong et al.. 1993). HNF4a is known to activate a wide variety of essential genes, including those 
involved m cholesterol, fatty acid, and glucose metabolism; blood coagulation; detoxification mechanisms; 
hepatitis B virus infections; and liver differentiation (reviewed in Sladek, 1993 and Sladek, 1994). 

HNF4a IS a member of the superfamily of ligand-dependent transcription factors, which includes 
the steroid hormone receptors, thyroid hormone receptor (TR), vitamin A receptor, and vitamin D receptor 
(VDR), as well as a large number of receptors for which ligands have not yet been identified, the so-called 
orphan receptors (reviewed in Landers and Spelsberg, 1992; O'Malley and Conneely, 1992; Parker, 1993; 
and Tsai and O'Malley. 1994). All receptors are characterized by two conserved domains: the zinc finger 
region, which mediates DNA binding, and a large hydrophobic domain which mediates protein 
dimerization. transactivation, and ligand binding. 

Whether HNF4a responds to a ligand is not known, but It has been shown to activate 
transcription in the absence of an exogenously added ligand (Hall et al.. 1994; Kuo et al.. 1992; Metzger 
et al., 1993; Mietus et al.. 1992; Reijnen et al.. 1992; Sladek et al.. 1990). HNF4a is also highly 
conserved with the Drosophila HrjF-4, containing 91% amino acid sequence identity to the rat HNF4a in 
the DNA binding domain and 68% identity in the large hydrophobic domain (Zhong etal.. 1993). 

The members of the receptor superfamily have been classified in a variety of ways, one of which 
is by their ability to dimerize with themselves and with other members of the superfamily. For example, 
the steroid hormone receptors, glucocorticoid, mineralocorticoid, and progesterone receptors IGR, MR, 
and PR, respectively), all bind DNA and activate transcription as homodimers. They are present in the 
cytoplasm complexed with heat shock proteins (HSP) until the presence of the appropriate ligand disrupts 
the complex, allowing the receptors to translocate to the nucleus (reviewed in Freedman and Luisi, 1993; 
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0-M3llev and Tsai, 1993; and Tsai and O'Malle,, 19941. On .ha other hand, .ha retinoid a.d recep.or 
IRARI and re.inoid X racap.o. (RXRI a» «eli as .he VOR, paro.isome prclifera.or.ac.iva.ed recep.or 
IPPAR), and TR, which do no, h,nd HSP and reside primarily in .he nucleus, all b,nd DNA and aCva.e 
,ranscrip.ion no. only as hoorodimers hu. also as he,erodime,s (reviewed in Giguere, .994; Parker, 1993; 
and Slunnenbers, 19931. Several of .he nuclear recep.ors hind DNA very ,nell,cien,ly, a. all, as 
homodir^ers (RXRa, BAR, VDR, TR, and PPARl hu. bind DNA well as he.erodin.ers (renewed ,n Grguere, 

,994 and S.unnenberg, 19931. A, leas, .wo o, .he recep.ors IRAR and TRl form he.erodinrers In solu.ron 

with RXRa (Hermannef^/., 1992; Kurokawa ef »/., 1993; Zhangef'/-. 1992)- 

The mos. c.n«»n dimeriza.ion par.ner for all of .hese .eceptors is RXRa. The .hird class of 

receptors identified dale reside in ho.h .he nucleus and .he cy.oplasm and bind DNA pre.eren.ially as 

monomers INGFI-B, FTZ-Fl, s.eroidogenic fac.or 1 ISF-ll, and RORal) (Giguere e, a/., 1995; Kurach, e, 

al.. 1994; Ohnoe/a/., 1994). 

HNF4a is very similar .. .he re.inoid recep.ors, in pa..icular .o RXRa, in ho.h amino acd 
sequence and DNA binding specificrty. Mouse RXRa is 60% identical ,o ra. HNF4a in .he DNA binding 
domain and 44% identical in .he large hydrophobic domain. In comparison, RARa, which readrly 
he.erodime,i:es with RXRa, is 61% identical .o RXRa in .he DNA binding domain and only 27% tden.rcal 
in .he large hydrophobic d^ain (Mangelsd.rf e,a/., 19921. HNF4a and RXRa have also been shown .0 
share «sponse elemen.s from at leas, si, differen. ge,«s as well as a consensus si.e of a direc. repea. of 
AGGTCA separa.ed by one nuclecide (referred .o as DR*11 (Caler et at.. .994; Carter et al.. 1993; 
Garcia a, al.. 1993; Ge a, al.. 1994; Hall a, al.. 1994; Hall « ./„ 1992; Kekule « a/., 1993; Ladras. 
,994- lucas a, al.. 1991; Nakshatri and Chambon, 1994; Widom a, al.. 1992). The slruCural and 
functional similari.ies of HNF4a and RXRa sugges. .ha, HNF4a migh. he.erodimerize with RXRa andlor 
Other receptors. 

Electrophoretic mobility shift analyses (EMSA) of HNF4a and RXRa proteins expressed m mo 
and in .i,. showed that HNF4a in fac. does no. he.erodimerize with RXRa on any one of a number of 
response elements and that while HNF4a forms homodimers in solution in the absence the DNA, 1. does 
no. form he.erodi^rs wi.h RXRa. 1. has also been shown that HNF4a does not heterodimerrze wrth a 

number of other receptors on DNA, suggesting that ,he lack of he.er.dimeriza.ion is a general p,oper.y of 

HNF4a. 
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These studies led to the proposal that HNF4a defmes a new subfamily of nuclear receptors 
which are presently exclusively in the nucleus, exist m solution, bind DNA as homod.mers, and do not 
form heterodimers with RXRa or other receptors. 

HNF4a is a member of the steroid homione receptor family. The members of this family have 
been classified according to the amino acid sequence in the knuckle of the first zmc finger (referred to as 
the P box) a region important for recognizing the sequence of the half site of the palindrome in hormone 
response elements (Forman and Samuels, 1990). For examples members of the thyroid hormone receptor 
subfamily contam ammo acid sequence EGCKG (SEQ ID N0:83) and bind to the thyroid response element 
(TRE). Members of the estrogen receptor subfamily contain the amino acids EGCKA (SEQ ID I\i0:84) and 
bind to estrogen response elements (ERE). The sequence of HNF4a is DGCKG (SEQ ID N0.85) and is 
most similar to that of the thyroid response element. Despite this similarity it appears that HNF4a does 
not bind TRE nor does it bind ERE, and the true ligand for HfJF4a is as yet undetermined. The screening 
methods of the present invention will lead one of ordinary skill in the art to elucidate such a ligand or 
ligands. 

The present invention describes the exon-intron organization and partial sequence of the human 
HNF4a gene. In addition, the inventors have screened the exons, flanking introns and minimal promoter 
region for mutations in a group of 57 unrelated Japanese subjects with early-onset d.abetes/MODY of 
unknown cause. The results of these screens suggest that mutations in the HNF4a gene may cause 
earlyonset diabetes/MOOY in Japanese but they are less common than mutations in the HNFla /M0DY3 
gene. The information presented herein on the sequence of the HNF4a gene and its promoter region will 
facilitate the search for mutations in other populations and studies of the role of this gene in determining 
normal pancreatic p cell function. 

Furthermore, current understanding of the M0DY1 form of diabetes is based on studies of only a 
single family, the R-W pedigree. Here the inventors report the identification of a second family with 
M0DY1 and the first in which there has been a detailed characterization of hepatic function. The present 
inventors demonstrate that MODYl is primarily a disorder of (i-cell function, however, the inventors have 
ascertained that mutations in HNF4a may lead to a cell as welt as ii-cell secretory defects or to a 
reduction in pancreatic islet mass. 
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Hepatic Nuclear Factor /p and DCoH 

Human HNFip is a home.donain.co«taining .ranscnp.™ .actor ol 557 ammo acids (type At with 
altarnative splicin, gone,at,np two othe, forms of 531 l.ype Bl a„d 399 amiao acids (type C) (Mendel 
„ ,99U: De Stmone 1991; ReyCampos e, ./., 1991; Bach and Yani., 1993). Tfe nocle.c and 

. !..„=„ HMF1R am niven in SEQ ID N0:128 and SEQ ID NQ;129, respectiyely. 
amino acid sepuences for tinman HNPip are given in i>Lii lu 

HNFip is structurally related to HNFla and functions as a homodimer or a heterodimer with HNFla. 
These dimers are s.ahifeed by the bifunctional protein, OCoHIPCBD (Mendel « ./.. 1991b; Citron e, 
1992) which hinds to the dime.i:ation domain of HNFl forming a he.oro.etrameric complex and 
enhancing transcriptional activity. As a hon^tetramer, PCBD is in.ol.ed in the regeneration of 
tetrahydrobiopterin, an essential cofact.r of phenylalanine hydro-ylaso and other mono-oxygenases, 
catalyzing .he conversion of A-hydroxytetrahydrobiopterin to guinonoid-dihydrobiopterin (Cron e, at.. 
,993; Johnen e, ./,. 1995). Loss of function mutations in PCBD are associated with a rare autosomal 
recessive form of mild hyperphenylalaninemia. HNFlp and DCoH mRNA „e expressed in mouse 
pancreatic islets implying that they ma, .unction together with HNFOa to regulate gene expression in 
this tissue. Human DCoH is a p-otein of 104 amino acids (including the initiating methionine) (Thony 
at 1995) and functions as described herein below. 

MOar-W, BM,m is . MamUsMon ofBafect, in Hapatac,!. Nuclaar Factors 
„ is estabUshed that all forms o. Type 2 rfiabetes are associated with profound insulin secretory 
defects which include loss of .he first phase respo^e to intravenous glucose, delayed and blunted 
responses to ingestion of a mixed meal, loss of the normal oscillatory patterns of insulin secretion, and 
increased secretion of proinsulin and proinsulin* products. The molecular basis of these secretory 
defects in bumans is unknown, ahhough in rats it has been shown that there are global changes in gene 
expression in the islets ol diabetic and prediabetic animals. One such global al.era.ion is .be reduction in 
the levels of mRNAs encoding many pancrealic islet specific proteins. This defect in gene expression 
would be compatible with decreased levels of a master transcription factor whose levels affac, .he 
expression of a whole array of downs.ream genes. 

The present invention predicts that .he p cell dysfunction and insulin secretory defects 
associa.ed with M00Y3 are as a result of mutations in HNFla, furthermore it demonstrates that p-cell 
dysfunction associated with MODVl are a result of mutations in HNF4a. 
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The features of I^ODY type diabetes are very sirn.lar to those of late onset Type 2 diabetes 
Hence, acquired defects m the expression of HNFl a, HNF4a. and HNF1 p, respectively, may well occur in 
late onset diabetes and lead to 3-cell dysfunction and msul.n secretory defects in this fonti of diabetes. 
The identification of agents that activate transcription of HNFl a, HNFl p and HNF4a will be therapeutic 
for the treatment of MODY, as well as late onset Type 2 diabetes. The present mvention details methods 
for the identification of such agents which will then be used to increase the expression of HNFIa, 
HNFl p and HNF4a which in turn will lead to the increased transcription/expression or activation of p cell 
genes such as insulin. 

It is clear from the present invention that hepatocyte nuclear factors, their expression, regulation 
and modification have far reaching implications in diabetes. To date three of the four types of MOOY 
diabetes identified, are predicted to affect gene expression. Other forms of MODY can not be ruled out, 
for example genetic linkage studies predict the presence of additional MODY genes, the chromosomal 
localization of which are presently unknown. 

The absolute HNF4a dependence of the HNFIa promoter coupled with evidence of the ability of 
HNF4a to rescue endogenous HNFIa expression is mdicative of HNF4a being an essential regulator of 
HNFIa (FIG. 6). Thus activation or repression of HNF4a will result in an indirect activation or repression 
of HNFIa . The present invention elucidates methods for identifying factors responsible for modulating 
HNF4a expression and/or activity. 

HNFip, also known as vHNFI, is closely related to HNFIa and is able to form heterodimers with 
HNFIa. Dimerization between members of classes of transcription factors appears to solve the problem 
of controlling expression of a very large number genes. An obvious advantage of the dimerization ability 
of a transcription factor is that it provides an opportunity to diversify the number of regulatory 
mechanisms that can be associated with a single regulatory DNA binding site. Another advantage lies in 
tfie possibility of translating subtle alterations in the relative levels of expression of members of a 
25 dimerization pair into a substantial quantitative effect on transcription. 

FIG. 6 summarizes the different factors involved in the regulation of expression and activity of 
the HNF transcription factors described above. From the inventors investigations it is conceivable that 
aberrations at any points along this pathway or any factors affecting this pathway directly or indirectly 
will result in p-cell dysfunction and diabetes mellitus, either as MOOY or late-onset diabetes. 
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The present invent.on has shown that mutat.ons in HNFla are clearly responsible for M0DY3 
type diabetes. As discussed earlier HNFla binds to DNA as a dimer. th.s can either be a homod.mer or a 
heterodin^er with HNFip (SEQ ID NO: 80). The two forms of HNFl are expressed in comparable amounts 
in the liver but there is a three-fold higher expression of HNFl p in the kidney as compared to HNFla. 

HNFip lacks the transcriptional activity attributable to HNFla. One potential consequence of 
th.s observation in combinat.on with its ability to dimerize with HNFla is that HNFip is likely to be a 
negative regulator of HNFla transcriptional activity. This observation is suggested by the presence of 
vHNFI in systems that do not express the majority of hepatocyte-specific gene products (Baumhueter et 
aL 1988). However, studies by Mendel et aL (1991) were unable to confirm this observation. 

Studies by Mendel et aL. (1991) indicated that a dimerization cofactor of HNFl (DCoH) may 
increase the stability of HNFla dimers. Thus, it is suggested that DCoH has the potential to restnct the 
activity of HNFla andlor HNFip. There are a number of hypothesis as to how DCoH affects HNFl 
activation of transcription. HNFla is a monomer in solution and can only bind DNA as a dimer. the 
presence of DCoH favors the formation of the dimeric HNFla. Alternatively it is plausible that DCoH 
induces a conformational change in HNFla to create a more potent transcriptional activator either 
directly or by allowing imeraction with other proteins, for example HNFip. Yet another alternafve is 
that DCoH decreases the rate of HNFla degradation thereby stabilizing HNFla and potentiating the 
effects of HNFla. 

The present invention demonstrates that M0DY4, which was previously uncharactenzed, is a 
manifestation of defects in HNFip. The present invention describes specific mutations in HNFip that 
have led to M0DY4 in certain individuals. In light of these observations, there are decribed herein 
methods for the identification and isolation of factors involved in the activity of HNFl P and DCoH with a 
view to obtaining insights into therapeutic intervention in diabetes. 
C. In vitro Screening Assays for Candidate Substances 

Certain aspects of this invention concern methods for conveniently evaluating candidate 
substances to identify compounds capable of stimulating HNFla-. HNFip- or HNF4a-mediated 
transcription. Such compounds will be capable of promoting gene expression, and thus can be said to 
have up-regulating activity. In as much as increased gene expression of. for example, the insulin gene in 
the body functions to alleviate the symptoms of diabetes, any positive substances identified by the 
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assays of the present invention will be ami diabetic drugs. Before human administration, such 
compounds would be rigorously tested using conventional animal models known to those of skill m the 
art. 

Successful candidate substances may function in tfie absence of mutations in HNFla, HfJFip or 
HNF4a m which case the candidate compound may be termed a "positive stimulator" of HNFla, HNFip 
or HNF4a, respectively. Alternatively, such compounds may stimulate transcription in the presence of 
mutated HNFla, HNFip or HNF4a overcoming the effects of the mutations, i.e., function to oppose 
HNFla-mutant, and/or HNFip, and/or HNF4a-med.ated diabetes, and thus may be termed "an HNFla 
mutant agonist" "HNFip mutant agonist" or -HNF4a mutant agonist" respectively. Compounds may 
even be discovered which combine all three of these actions. Although the agonist class of compounds 
may ultimately seem to be the most desirable, compounds of either class will likely be useful therapeutic 
agents for use in stimulating gene expression and combating MODYl, M0DY3, M0DY4, and late onset 
Type 2 diabetes in human subjects. 
Candidates for HNFla 

As HNFla is herein shown to be linked to M0DY3 type, one method by which to identify a 
candidate substance capable of stimulating //A'f /a-mediated transcription in diabetes is based upon 
specific protein.DNA binding. Accordingly, to conduct such an assay, one may prepare an HNFla binding 
protein composition, such as recombinant HNFla, and determine the ability of a candidate substance to 
increase HNFJa protein binding to a DNA segment including a complementary HNFla binding sequence, 
i.e., to increase the amount or the binding affinity of a proteinrDNA complex. 

This generally would be achieved using two parallel assays, one of which contains HNFla and 
the specific DNA alone and one of which contains HNFla, DNA and the candidate substance 
composition. One would perform each assay under conditions, and for a period of time, effective to allow 
the formation of proteiniDNA complexes, and one would then separate the bound protein:DNA complexes 
from any unbound protein or DNA and measure the amount of the protein.DNA complexes. An increase in 
the amount of the bound proteiniDNA complex formed in the presence of the candidate substance would 
be indicative of a candidate substance capable of promoting HNFla binding, and thus, capable of 
stimulating HNFla mediated transcription. 
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In such binding assays, the amount of the protein:DNA complex may be measured, after the 
removal of unbound species, by detecting a label, such as a radioactive or enzymatic label, which has 
been incorporated into the original HNFIa protein composition or recombinant protein or HNFIa 
containing DNA segment. Alternatively, one could detect the protein portion of the complex by means of 
5 an antibody directed against the protein, such as those disclosed herein. 

Preferred binding assays are those in which either the HNFIa protein, recombinant protein or 
purified composition or the HNF la-containing DNA segment is bound to a solid support and contacted 
with the other component to allow complex formation. Unbound protein or DNA components are then 
separated from the protein:DNA complexes by washing and the amount of the remaining bound complex 
10 quantitated by detecting the label or with antibodies. Such DNA binding assays form the basis of filter 
binding and microtiter plate-type assays and can be performed in a semi-automated manner to enable 
analysis of a large number of candidate substances in a short period of time. Electrophoretic methods, 
such as the gel-shift assay disclosed herein, could also be employed to separate unbound protein or DNA 
from bound proteinrDNA complexes, but such labor-intensive methods are not preferred. 
15 Assays such as those described above are initially directed to identifying positive stimulator 

candidate substances and do not, by themselves, address the activity of the substance in the presence of 
HNFIa mutants. However, such positive regulators may also prove to act as HNFIa mutant agonists, 
and in any event, would likely have utility in transcriptional promotion, either in vitro or in vivo. Positive 
regulators would likely be further evaluated to assess the effects of HNFIa mutants on their action, for 
20 example, by employing a cellular reporter gene assay such as those described herein below. 

Virtually any candidate substance may be analyzed by these methods, including compounds which 
may interact with HNFIa binding proteinlsj. HNFIa or protein:DNA complexes, and also substances such 
as enzymes which may act by physically altering one of the structures present. Of course, any compound 
isolated from natural sources such as plants, animals or even marine, forest or soil samples, may be 
25 assayed, as may any synthetic chemical or recombinant protein. 

Another potential method for stimulating HNFIa-mediated transcription is to prepare a HNFIa 
protein composition and to modify the protein composition in a manner effective to increase HNFIa 
protein binding to a DNA segment including the HNFIa protein binding sequence. The binding assays 
would be performed in parallel, similar to those described above, allowing the native and modified HNFIa 
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b-ndrng protem to be compared. In addit.on to phosphatases and kinases, other agents, mclud.np 
proteases and chem.cal agents, could be employed to modify HNFIa b.ndmg protein. The present 
mvent.on. «,th the cloning of mutant HNFla cDNA. also opens the way for genetically engineering 
HNFla protem to promote gene transcription m diabetes. In this regard, the mutation of potential 
phosphorylation sites and/or the modification or deletion of other domams is contemplated. 
Candidates for HNF4a binding 

The criteria shown above for screemng of modulators of HNFla are also true of HNF4a. mHa 
is a member of the steroid hormone receptor superfamily however, the iigand for HNF4a .s unknown. 
The identification of the endogenous Iigand for HNF4a binding would be an important step towards 
elucidating the mechanisms of eukaryotic gene control, and would also provide biomedical science with a 
powerful tool by which to regulate specific gene expression. Such a development would lead to numerous 
useful applications in the pharmaceutical and biotechnological industries. Although many appiicat.ons are 
envisioned, one particularly useful application would be as the central component in screening assays to 
identify new classes of pharmacologically active substances which may be employed to manipulate, and 
particularly, to promote, the transcript.on of genes whose expression is altered in diabetes. 

Hence HNF4a would be of great use in identifying agents to combat MODY and Type 2 diabetes. 
An anti-diabetic agent isolated by the screening methods of the present invention would act to promote 
the cellular transcription or function of HNF4a. which would in turn serve to increase transcription of 
genes whose activity is regulated by HNF4a (for example HNFla) thereby increasing the transcription of 
genes involved in diabetes and alleviating the symptoms of diabetes. 

Candidates for HNFI^ binding 

The criteria shown above for screening of modulators of HNFla and HNF4a are also true of 
HNFip. HNFip is a 557 amino acid that Is structurally related to HNFla and functions as a homodimer 
and heterodimer with HNFla. These dimers are stabilized by DCoH. The identification of factors that 
affect this dimerization, or any of the factors involved in the heterotetrameric complex, will provide useful 
compounds for the modulation of transcriptional activity. Such a development would lead to numerous 
useful applications in the phamiaceutical and biotechnological industries. Although many applications are 
envisioned, one particularly useful application would be as the central component in screening assays to 
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Wentify new classes of pharmacologolly aci.a substances which may be e„.ptoyed to manipulate, and 
particularly, to promote, the Iranscriplioa of genes whose expression is alteted « diabetes. 

Hence HNFip would be ol gteat use in identifying agents to combat MODY and Type 2 drabetes. 
An anti-diabetrc agem isolated by the screening methods of the present in,en.,on «ould act to promote 
the cellular transcription or function of HNFlp. which would in turn ser.e to increase transcript«.n of 
genes whose activity is regulated by HNFip dor example HNFtal thereby increasing the transcnptron of 
genes involved in diabetes and alleviating the symptoms of diabetes. 
D. Reporter Genes and Cell-Based Screening Assays 

Cellular assays also are available for screening candidate substances to identify those capable of 
stimulating HNFla- HNFlp- and HNF4a-mediated transcription and gene expression. In these assays, 
the increased expression of any natural or heterologous gene under the control of a functional HNFla, 
HNFlp or HNF4a protein may be employed as a measure ol stimulatory activity, although the use of 
reporter genes is preferred. A reporter gene is a gene that confers on its recombinant host cell a readrly 
detectable phenotype that emerges only under specific conditions. In the present case, the reporter gene, 
being under the control of a functional HNFla, HNFip or HNF4a protein, will generally be repressed 
under conditions of l^0DY3, M0DY4 or MODYl diabetes respectively and will generally be expressed ,n 
the M0DY3, M0DY4 or MDDYl non diabetic conditions respectively. 

Reporter genes are genes which encode a polypeptide not otherwise produced by the host cell 
which is detectable by ar«lysis of the cell culture, e.g., by fluorometric, radiors.topic or 
spectrophotometric analysis of the cen cutture. Ex»nplary enzymes include luciferases, transferases, 
esterases, phosphatases, proteases Itissue plasminogen activator or urokinase], and other enzymes 
capable of being detected by their physical presence or functional activity. A reporter gene often used ,s 
chloramphenicol acetyltrensferase (CATl which may be employed with a radiolabeled substrate, or 
lucif erase, which is measured fluorometrically. 

Another class of reporter genes which confer detectable characteristics on a host cell are those 
which encode polypeptides, generally enzymes, which render their transf ormants resistant against toxins. 
e.Q., the neo gene which protects host cells against toxic levels of the antibiotic G418, and genes 
encoding dihydrofolate reductase, which confers resistance to methotrexate. Genes of this class are not 
generally preferred since the phenotype (resistance) does not provide a convenient or rapid quantitative 
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OU.PU,. Resistance ,o an„5,.„c „, ,„„„ ,eq„„es day. of cul,„,e ,„ c„„f,™, „, comple, assay procedures 
1' other than a biological determination is to be made. 

Other genes of potential for pse in screening assays are those capable of transforming hosts to 
express u„,p„e call sprface antigens, „ir„ p,o,eins spch as HIV bp120 or herpes gD »h,ch are 
readily detectable by immunoassays. However, antigenic reporters are not preferred because, onlike 
eniymes, they ate not catalytic and thus do not amplify their signals. 

The polypeptide products of the teporter gene are secreted. Intracellular or, as noted abo.e 
membrane bound polypeptides. If the polypeptide is not ordinarily secreted it is fused to a heterologous 
«gnal seguenoe for processing and secretion. In other circumstances the signal is modified in order to 
temoye seguences that interdict secretion, for example, the herpes gO coat protetn has been modified by 
me ditected deletion of its transmembrane binding domain, thereby facilitating its secretion ,EP 
'39,41 7A1. This truncated form of the herpes gD protein is detectable in the culture medium by 
conyemional tmmunoassays. Preferably, however, the products of the reporter gene are lodged in the 
intracellular or membrane compartments. Then they can be fixed to the culture container, e., microtiter 
wells, in Which they are grown, followed by addition of a detectable signal generating substance such as 
a chromogenic substrate for reporter enzymes. 

The transcriptional promotion process which, in its entirety, leads to enhanced ttanscription is 
termed -actiyation." The mechanism by which a successful candidate substance acts is not material 
Since ,he obiective is to promote HNFla, HMFip „r HNF4a metfiated gene expression, or e»en, to 
promote gene expression In the presence of mutant HNFIa, HNFlp, or HNF4a gene products, by 
whatever means. 

To create an appropriate vector or plasmid for use in such assays one would ligate the HNFla 
containing promoter, whether a hybrid or the natiye HNFla promoter, to a DNA segment encoding the 
reporter gene by conventional methods. Similar assays are also contemplated using HNF1(J and HNF4a 
promoters. The HNFla, HNFlp o, HNF4a promoter seguences may be obtained by Jr .„ro synthesis or 
recovered from genomic DNA and should be ligated upstream of the start codon of the reporter gene The 
present invention provides the ptomoter region for human HNFla. a comparison of the seguence of the 
promoter region of the human, rat. mouse, chicken and frog HNFla genes ,s given in FIG 22 There is 
also provided herein aomparlson of the seguences of the promoter regions of the human and mouse 



BNSDCCID cWC 9en254A^ 



PCT/US97/16037 

WO 98/1 1254 

31 

HNF4a genes (FIG. 13). The partial sequence of the human HNFl(i gene including promoter has also 
been identified by the present inventors and deposited in the GenBank database under accession numbers 
U90279-90287 and U96079. Any of these promoters may be particularly preferred in the present 
invention. An AT-rich TATA box region should also be employed and should be located between the HNF 
5 sequence and the reporter gene start codon, The region 3' to the coding sequence for the reporter gene 
will ideally contain a transcription termination and polyadenylation site. The promoter and reporter gene 
may be inserted into a replicable vector and transfected into a cloning host such as £. coli, the host 
cultured and the replicated vector recovered in order to prepare sufficient quantities of the construction 
for later transfection into a suitable eukaryotic host. 
10 Host cells for use in the screening assays of the present invention will generally be mammalian 

cells, and are preferably cell lines which may be used in connection with transient transfection studies. 
Cell lines should be relatively easy to grow in large scale culture. Also, they should contain as little native 
background as possible considering the nature of the reporter polypeptide. Examples include the Hep G2. 
VERO. HeLa. human embryonic kidney (HEK)- 293. CHO. WI38. BHK. 005-7. and MDCK cell lines, with 
15 monkey CV l cells being particularly preferred. 

The screening assay typically is conducted by growing recombinant host cells in the presence and 
absence of candidate substances and determining the amount or the activity of the reporter gene. To 
assay for candidate substances capable of exerting their effects in the presence of mutated HNF la, 
HNF IP and/or HNF4a gene products, one would make serial molar proportions of such gene products that 
alter HNFla-, HNFl^ and HNF4a-mediated expression. One would ideally measure the reporter signal 
level after an incubation period that is sufficiem to demonstrate mutant-mediated repression of signal 
expression in controls incubated solely with mutants. Cells containing varying proportions of candidate 
substances would then be evaluated for signal activation in comparison to the suppressed levels. 

Candidates that demonstrate dose related enhancement of reporter gene transcription or 
expression are then selected for further evaluation as clinical therapeutic agents. The stimulation of 
transcription may be observed in the absence of mutant HNFla, HNFip or HNF4a. in which case the 
candidate compound might be a positive stimulator of HNFla HNFip or HNF4a transcription, 
respectively. Alternatively, the candidate compound might only give a stimulation in the presence 
mutated HNFla, mutated HNFip or mutated HNF4a protein, which would indicate that it functions to 
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oppose the mutation-mediated suppression of the gene expression. Candidate compounds of either class 
might be useful therapeutic agents that would st.mulate gene express.on and thereby combating MODY 
and Type 2 diabetes. 
E. niucleic Acids 

As described the Examples, the present invention discloses the gene at the M0DY3 locus of 
chromosome 12. M0DY4 locus as being associated with HNFI3 and the gene at the M0DY1 locus of 
chromosome 20. Mutations in these genes are responsible for diabetes. The present invention discloses 
mutations m the HNFla, HNFip, and HNF4a genes identified by PGR techniques. The gene for the M0DY3 
locus has for the first time been identified as hepatocyte nuclear factor la, herein referred to as HNF1a. 
The gene for the MODYl locus has been identified as hepatocyte nuclear factor 4 a (HNF4a). The gene for 
the M0DY4 locus has been identified as hepatocyte nuclear factor ip (HNFlfi) 

In one embodiment of the present invention, the nucleic acid sequences disclosed herein find utility 
as hybridization probes or amplification primers. In certain embodiments, these probes and primers consist 
of oligonucleotidef ragments. Such fragments should be of sufficient length to provide specific hybridization 
to an RNA or Df^A sample extracted from tissue. The sequences typically will be 10-20 nucleotides, but 
may be longer. Longer sequences, e.g. . 40, 50, 1 00. 500 and even up to full length, are preferred for certain 
embodiments. 

Nucleic acid molecules having contiguous stretches of about 1 0, 1 5, 1 7, 20. 30, 40, 50, 60, 75 or 
100 or 500 nucleotides from a sequence selected from the group comprising SEQ ID N0:1. SEQ ID NO-3 
SEQ ID N0:5, SEQ ID N0:7. HNFla and its mutants are contemplated. In other embodiments nucleotides 
from a sequence selected from the group comprising SEQ ID N0:78, SEQ ID NQ:34, SEQ ID N0:35. SEQ ID 
N0:38. SEQ ID N0:40, SEQ ID N0:42, SEQ ID f^0:44. SEQ ID NQ:46, SEQ ID N0:48. SEQ ID N0:50. SEQ ID 
N0:52, SEQ ID N0:54, HNF4a and Its mutants are contemplated. In still other embodiments nucleotides 
from a sequence selected from the group comprising SEQ ID NO:--. SEQ ID NO: -, SEQ ID NO: -, SEQ ID 
NO:-.., SEQ ID NO:-, SEQ ID NO:-, SEQ ID NO:-.-. SEQ ID NO:-. SEQ ID NO:-. SEQ ID NO:.-. SEQ ID NO:- 
, SEQ ID NO:-, HNFip and its mutants are contemplated. Molecules that are complementary to the above 
mentioned sequences and that bind to these sequences under high stringency conditions also are 
contemplated. These probes will be useful in a variety of hybridization embodiments, such as Southern and 
northern blotting, in some cases, it is contemplated that probes may be used that hybridize to multiple target 
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sequences without compromising their ability to effectively diagnose diabetes (MODYl, M0DY3. and 
M0DY4). In certain embodiments, it is contemplated that multiple probes may be used for hybridization to a 
single sample. 

Various probes and primers can be designed around the disclosed nucleotide sequences. Pnmers 
may be of any length hut, typically, are 1 0-20 bases in length. By assigning numeric values to a sequence, 
for example, the first residue is 1. the second residue is 2, etc., an algorithm defining all primers can be 
proposed: 

nton + y 

where n is an integer from 1 to the last number of the sequence and y is the length of the primer 
mmus one, where n ^ y does not exceed the last number of the sequence. Thus, for a lO mer. the probes 
correspond to bases 1 to 10, 2 to 11. 3 to 12 ... and so on. For a 15-mer. the probes correspond to bases 1 
to 1 5, 2 to 16, 3 to 1 7 ... and so on. For a 20 mer, the probes correspond to bases 1 to 20, 2 to 21. 3 to 22 
... and so on. 

The values of n in the algorithm above for the nucleic acid sequences is: SEQ ID NO: 1, n- 3238 for 
HNFIa, SEQ ID N0:78 n- 1441 for HNF4a, SEQ ID N0:1 28 for HNF1 p. 

The use of a hybridization probe of between 17 and 100 nucleotides in length allows the formation 
of a duplex molecule that is both stable and selective. Molecules having complementary sequences over 
stretches greater than 20 bases in length are generally preferred, in order to increase stability and selectivity 
of the hybrid, and thereby improve the quality and degree of particular hybrid molecules obtained. One will 
generally prefer to design nucleic acid molecules having stretches of 20 to 30 nucleotides, or even longer 
where desired. Such fragments may be readily prepared by. for example, directly synthesizing the fragment 
by chemical means or by introducing selected sequences into recombinant vectors for recombinant 
production. 

Accordmgly, the nucleotide sequences of the invention may be used for their ability to selectively 
form duplex molecules with complementary stretches of genes or RNAs or to provide primers for 
amplification of ON A or RNA from tissues. Depending on the application envisioned, one will desire to employ 
varying conditions of hybridization to achieve varying degrees of selectivity of probe towards target 
sequence. 
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For applications requiring high selectivity, one will typically desire to employ relatively stringent 
conditions to form the hybrids, e.g.. one will select relatively low salt and/or high temperature conditions, 
such as provided by about 0.02 M to about 0.10 M NaCl at temperatures of about 50°C to about 70°C. 
Such high stringency conditions tolerate little. ,f any, mismatch between the probe and the template or 
target strand, and would be particularly suitable for isolating specific genes or detecting specific mRNA 
transcripts. It is generally appreciated that conditions can be rendered more stringent by the addition of 
increasing amounts of f ormamide. 

For certain applications, for example, substitution of nucleotides by site directed mutagenesis, it is 
appreciated that lower stringency conditions are required. Under these conditions, hybridization may occur 
even though the sequences of probe and target strand are not perfectly complementary, but are mismatched 
at one or more positions. Conditions may be rendered less stringent by increasing salt concentration and 
decreasing temperature. For example, a medium stringency condition could be provided by about 0.1 to 0.25 
M NaCI at temperatures of about to about 55°C, while a low stringency condition could be provided 
by about 0.15 M to about 0.9 M salt, at temperatures ranging from about 20°C to about 55°C. Thus, 
hybridization conditions can be readily manipulated depending on the desired results. 

In other embodiments, hybridization may be achieved under conditions of. for example. 50 mM Tns- 
HCI IpH 8.3). 75 mf^ KCI, 3 mM MgCI^. 1.0 mM dithiothreitol, at temperatures between approximately 
20°C to about 37°C. Other hybridization conditions utilized could include approximately 10 mM Tris HCI 
(pH 8.3), 50 mM KCI. 1.5 mM MgCI^. at temperatures ranging from approximatelv40°C to about 72°C. 

In certain embodiments, it will be advantageous to employ nucleic acid sequences of the present 
invention in combination with an appropriate means, such as a label, for determining hybridization. A wide 
variety of appropriate indicator means are known in the art, including fluorescent, radioactive, enzymatic or 
other ligands, such as avidin/biotin, which are capable of being detected. In preferred embodiments, one may 
desire to employ a fluorescent label or an enzyme tag such as urease, alkaline phosphatase or peroxidase, 
instead of radioactive or other environmentally undesirable reagents. In the case of enzyme tags, colorimetric 
indicator substrates are known that can be employed to provide a detection means visible to the human eye 
or spectrophotometncally, to identify specific hybridization with complementary nucleic ac.d containing 
samples. 
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In general, it is envisioned that the hybridization probes described herein will be useful both as 
reagents in solution hybridization, as in PGR, for detection of expression of corresponding genes, as well as 
in embodiments employing a solid phase. In embodiments involving a solid phase, the test DNA (or RNA) is 
adsorbed or otherwise affixed to a selected matrix or surface. This fixed, single-strandednucleic acid is then 
5 subjected to hybridization with selected probes under desired conditions. The selected conditions will depend 
on the particular circumstances based on the particular criteria required (depending, for example, on the 
G + C content, type of target nucleic acid, source of nucleic acid, size of hybridization probe, etc.). Following 
washing of the hybridized suriace to remove non-specifically bound probe molecules, hybridization is 
detected, or even quantified, by means of the label. 
, 0 It will be understood that this invention is not limited to the particular probes disclosed herein and 

particularly is intended to encompass at least nucleic acid sequences that are hybridizable to the disclosed 
sequences or are functional analogs of these sequences. 

For applications in which the nucleic acid segments of the present invention are incorporated into 
vectors, such as plasmids, cosmids or viruses, these segments may be combined with other DNA sequences, 
15 such as promoters, polyadenylation signals, restriction enzyme sites, multiple cloning sites, other coding 
segments, and the like, such that their overall length may vary considerably. It is contemplated that a nucleic 
acid fragment of almost any length may be employed, with the total length preferably being limited by the 
ease of preparation and use in the intended recombinant DNA protocol. 

DNA segments encoding a specific gene may be introduced into recombinant host cells and 
20 employed for expressing a specific structural or regulatory protein. Alternatively, through the application of 
genetic engineering techniques, subportions or derivatives of selected genes may be employed. Upstream 
regions containing regulatory regions such as promoter regions may be isolated and subsequently employed 
for expression of the selected gene. 

In an alternative embodiment, the HNFla, HNFip or HNF4a nucleic acids employed may actually 
25 encode antisense constructs that hybridize, under intracellular conditions, to an HNFla or HNFa nucleic 
acid, respectively. The term "antisense construct" is intended to refer to nucleic acids, preferably 
oligonucleotides, that are complementary to the base sequences of a target DNA or RNA. Antisense 
oligonucleotides, when introduced into a target cell, specifically bind to their target nucleic acid and 
interfere with transcription, RNA processing, transport, translation and/or stability. 
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Antisense constructs may be designed to bind to the promoter and other control reg.ons. exons, 
mtrons or even exon-.ntron boundaries of a gene. Ant.sense RNA constructs, or DNA encod.ng such 
antisense RNAs, may be employed to inhibit gene transcription or translation or both w.thm a host cell, 
either /. „tro or .n mo. such as within a host animal, including a human subject. Nucleic acd sequences 
which comprise "complementary nucleotides" are those which are capable of base-pa.ring according to 
the standard Watson-Crick complementarity rules. That is, the larger purines will base pair with the 
smaller pyrim.dines to form combinations of guanine paired with cytosine (G.C) and adenine paired with 
e.ther thymine (A:T). in the case of Df^A, or adenine paired with uracil (A.U) in the case of RWA. Inclusion 
of less common bases such as inosine, B-methytcytosine. S-methyladenine, hypoxanthine and others m 
hybridizing sequences does not interfere with pairing. 

As used herein, the terms "complementary" means nucleic acid sequences that are substantially 
complementary over their entire length and have very few base mismatches. For example, nucleic acid 
sequences of fifteen bases in length may be termed complementary when they have a complementary 
nucleotide at thirteen or fourteen positions with only a single mismatch. Naturally, nucleic acd 
sequences which are "completely complementary" will be nucleic acid sequences which are entirely 
complementary throughout their entire length and have no base mismatches. 

Other sequences with lower degrees of homology also are contemplated. For example, an 
antisense construct which has limited regions of high homology, but also contains a non-homologous 
region ^e.g., a nbozyme) could be designed. These molecules, though having less than 50% homology, 
would bind to target sequences under appropriate conditions. 

While all or part of the HNFla, HNFip. HNF4cc gene sequence may be employed ,n the context 
of antisense construction, short oligonucleotides are easier to make and increase in mo accessibility. 
However, both binding affinity and sequence specificity of an antisense oligonucleotide to its 
complementary target increases with increasing length. It is contemplated that ant.sense 
oligonucleotides of 8, 9. 10, 1 1, 12, 13. 14, 15, 16, 17, 18, 19, 20, 25, 30. 35, 40, 45, 50, 60, 70, 80, 
90, 100 or more base pairs will be used. One can readily determine whether a given ant.sense nucleic 
acd IS effective at targeting of the corresponding host cell gene simply by testing the constructs vitro 
to determine whether the endogenous gene's function ,s affected or whether the expression of related 
genes having complementary sequences is affected. 
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,„ certain embodiments, one ™, w,sK ,o employ an,.ense cons,™cu wh,ch ,n*de .,he, 
.lemen... .o, s.a^ple. .ho. which include C-5 p.opyne pyrimidines. Oiiponoc,eo„de. which con,a,n « 

potent antisense inhibitors of gene expression Wagner el al.. 19931. 

Throoghom this applica.ron, the tenn "e.p.ess.on construe," is mean, ,o nclude any ,ype .1 

acid encoding se,uence is capable o. being transcribed. The .ranscrip, .ay be translated into , pro,e,n^ 
b„, it need no, be. Thus, in certain entbodimams, expression includes both .ranscHp,„n o, a gene an 
translation o. a RNA in,o a gene product. In other embodiments, expression on,, includes ,ranscnp.,on o. 
,he nucleic acid, for example, ,o generate antisense constructs. 

,„ prelerred embodiments, the nucleic acid -s under transcriptional control o, a promoter 

synthetic machinery, required to initiate ,he specilic transcription o, a gone. The phrase under 
..nsciptional control" means that the promote, is in the correc, location and o,ien,a,ion m relat.on to 
the nucleic acid ,o control RNA polymerase ini,ia,ion and expression of ,he gene. 

The ,erm promo.e, will be used her. ,o refer ,o a group o, transcriptional con.rol modules ,ha, 
are clustered a,ound ,he ini,iati.n si,e for RNA polymerase II. Much of ,he ,hin.ng abou, bow promo,e,s 
are .,gani.ed deri.es .,om analyses o, several .iral promoters, including ,bose for the HSV thyn. ,„e 
kinase «H and SV40 earl, transcription units. These studies, augmented by more recent wo,k, a.e 
shown ,ba, promo,e« are composed of discrete functional modules, each consisting of app-ox^tely 7- 
20 bp 0. DNA, and containing one or more ,eo.gnition sites lor transcriptional ,c,i.a,or or ,ep,essor 

'""""a, teas, one module in each promoter functions to position the start si,e for RNA synthesis. The 
best known example o, this is the TATA box, but in some p,om.te.s lacking a TATA box such as ,be 
promoter fo, .he mammalian .enninal deo.ynucleo,idyl transferase gene and ,he promoter for the 
,a,e genes, a discre.e elemem overlying .he start site itself helps to fix .he place of in,.,a„on. 

Addi.ional promoter elements regulate the frequency of .ranscripUonal initiation. Typrcally, these 
are loca.ed in ,be region 30 ,10 bp upstream of .he s.art sr.e. aUhougb a number of promo,ers have 
,ecen,ly been shown to contain functional elements downstream of the start site as well. The spacmg 
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be,we.„ promote, elements fregoen.ly is flexible, so ,ha> promote, funcon ,s presetted when elements 
.,e ,n,e,ted o, moved ,ela„ve to one anothe,. In the tk ptomotet, the spacn, between ptomote, 
elements can be increased to 50 hp apart before act.vity beg.ns to dechne Dependinj on the promoter rt 
appears that indr.idaal elements can faction either co.opera,„el, or independently ,o actr.ate 

transcription. 

The particolar promoter that is employed to control the expression of a nocle.c acd ,s no, 
bel,e.ed ,o be ci.ical, so long as it is capable of exptessrng the nucierc acid ,n the tatgeted cell Thos 
«he,e a human cell is tatgeted. i, is p,efe,ahle to positron the nucleic acid codins ,eg,on adjacen, ,o and 
under .he con.tol of a p,omote, that is capable of being expressed ,n a human cell. Generally .peakrng 
..ch a promote, might rnclude eithe, a human o, ,i,al p,„mote,. P,efe„ed promoters include those 
denved from HSV, and HNFfrx (see fo, example, PIG. 221, mm o, HNF4a promoter Isee for example 
fIG. 13). The partial sequence of the human HNf ip gene including promoter has also been identif.ed by 
the present inyentors and depos.ted in the GenBank database unde, accession numbe,s U90279.90287 
and U96079 ,SEO ID N0:,28). Ano.he, p,eierred embodiment is the tetracychne controlled promote. 

In varrous other embodrments, the human cy.omegaloy.rus (CMVl rmr^ediate early gene promoter 
.he SV40 ea,ly promoter and the Rous sarcoma yirus long .erminal repeat can be used to obtain high-leyel 
expressron of transgene. The use of other viral o, manunalian cellule, o, bacenal phage p,omote,s 
which a,e well-known in ,he a,, to achieve expression of a .ransgene is contemplated as well, ptovided 
that the levels of expression ate sufficient fo, a given pu,pose. Tables 1 and 2 hst sevetal 
elamen,s/p,omo,e,s which may be employed, in the con.ex, of the ptesent invention, to tegulate .he 
exptession of a ttansgene. This lis, is no. intended .o be exhaustive all .he possible elements involved 
in the ptomotron of transgene expression but, me.ely, to be exempla,y theteof. 

Enhances we,e ciginally de.ened as genetic elements ,hat inceased ttanscption f,om a 
p,omo,e, located at a distant position on .he same molecule of DNA. This abili.y to act over a large 
distance had li.tle precedent ,„ classic studies of prokaryotic .ranscrtptienal regulation. Subseguent work 
Showed ,ha, regions of ONA with enhancer ac.ivity are organrzed much like promoters. That ,s they a,e 
composed of many ,nd,.,dual elements, each of which binds to one or more transcriptional proterns 

The basrc d,st,nct,on between enhances and p,omoters is ope,a.,onal. An enhance, region as a 
whole must be able to strmulate transcription a, a distance; this need not be .,ue of a promoter reg.on or 
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i,s ccponem .smen... 0„ .he o.h« hand, a p™™,e, ™s. have one o, no,e elemen. .ha. d,,=c, 
,ni,ia.,o. 01 RNA symhes.s „ a p.r,icu,a, s,.e and in a particular onen.a„on. wherea. enhance. ,ac. 
these .peci.ici.ies. Pron».ers and enhancers are n..en o.eHappinB and con.ignops, o..en seen,n9 .» 

have a very similar modular organization. 

Addi.ionallv an, promo.erlenhancer enmbina.ion (as per .h. Eukaryo.ic Promo.e, Oa.a Base 
EPDB, co.,d ... be used .c drive expression o. a .ransgene. Use o. a T3, T7 or SP6 cyropiasmrc 
expression sys.en, is ano.her possible enrbadinren,. EukaryoUc cells can suppor. cy.oplas,™c 
.inscription „om certain baceria. pronro.ers if .he appropna.e bac.er.al polymerase is provided, e,.her 
as part of .he delivery complex or as an ad*.,onal pene.ic expression cons.tuC. 
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TABLE 1 



PROMOTER 



Immunoglobulin Heavy Chain 


p-H A-rac 

U 1 1 H I do 


1 Immunoglobulin Light Chain 


1 nci il in 


j T Ceil Receptor 


i><eurai ueii Aonesion Molecule (NCAM) 


(HLADQaandDQB 


Ui •Hiuiirypsin 


R-lnterferon 


1 n^D \ I rizD) njsione 


|MnterlBukin-2 


mouse or i ype 1 Collagen 1 

— ■ _ — |[ 


1 lnterleukin-2 Receptor 


Glucose RegulatedProteins (GRP94 and GRP78) 1 


|mhC Class 115 


nat browth Hormone 


|MHC Class II HLA-DRa 


Human Serum Amyloid A (SAA) 


R-Actin 


1 roponm I (TN 1) 


Muscle Creatine Kinase 


Platelet-DerivedGrowth Factor ; 


Prealbumin (Transthyretin) 


Duchenne Muscular Dystrophy ji 


Elastase/ 


o V*f U 


Metallothionein 


r Uiyuitld 


Collagenase 

1 


Retroviruses 


Albumin Gene 


Papilloma Virus j 


a -Fetoprotein 


Hepatitis B Virus | 


a-Globin j 


Human Immunodeficiency Virus | 


[^•Globin 


Cytomegalovirus 


c-fos 


Gibbon Ape Leukemia Virus | - 
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TABLE 2 




MT II 



Phorbol Ester (TPA) 
Heavy metals 




MMTV {mouse mammary tumor virus) 



R-lnterferon 



Adenovirus 5 E2 



Glucocorticoids 



poly(rl)X 
polylrc) 



Ela 



c-jun 



Collagenase 



Phorbol Ester {TPAI.HjOj 



Phorbol Ester (TPA) 



Stromelysin 



SV40 



Murine MX Gene 



Phorbol Ester (TPA), IL-1 



Phorbol Ester (TPA) 



Interferon, Newcastle Disease Virus 



GRP78 Gene 



a-2-Macroglobulin 



Vimentin 



A23187 



IL-6 



Serum 



MHC Class I Gene H-2kB 



HSP70 



Proliferin 



Tumor Necrosis Factor 



pThyroid Stimulating Hormone a Gene 



Interferon 



Ela. SV40 Large T Antigen 



Phorbol Ester-TPA 



FMA 



Thyroid Hormone 



Use of the baculovirus system will involve high level expression from the powerful polyhedron 
promoter. 

One tvpic3llv include > polyadenylation signal to effect p.eper polvadenvletion of the 
.ranscipt The nature of tl« polvadenylation signal is not helie.ed t. be crucial to lha sucoessiul 
practice of the Mention, and any such se,uenca t^ay be employed. Preferred embodintenls include the 
SV40 polyadenylation signal and the bo.ine growth hormone polyadenylation signal, con,e™ent and 
Known to function well in various target cells. Also contemplated as an element of the expression 
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casset.e is a ,e,m,na.o, These elements can serve ,o ennance message levels and n„„,m„e read 
Ibrougli from the cassette into other sequences. 

A specifrc initiation signal also may be required for efficient translatron of codrng sequences 
These signals include the ATG initiation coden and adjacent sequences. Exogenous translat.onal control 
srgnals, including the ATG initiation codon, may need to be provided. One of ordrnar, skill ,n the art „ould 
-eadrly be capable of deterntmins this and providing the necessary signals, it is «,ell known that the 
rnrtration codon must be -in frame " with the reading frame of the desired coding sequence to ensure 
translation of the entire insert. The exogenous translational control signals and initiation codons can be 
either natural or synthetic. The efficiency of expression may be enhanced by the inclusion of apptopriete 
transcription enhancer elements (Bittner eta/., 1987). 

In various embodiments of the invention, the expression construct may comprise a virus or 
engrneered construct derived from a vire, genome. The ability of certain viruses to enter cells via 
receptor-mediated endocytosis and ,„ integrate into the host cell genome and exptess viral genes stably 
and efficiently have made them attractive candidates for the transfer of foreign genes into mammalian 
cells (Ridgeway, ,988; Nicolas ,nd Rubenstein, ,988; Baichwal and Sugden, ,986; Temin, ,986) The 
firs, viruses used as vectors were DMA viruses including the papovaviruses (simian virus 40 bovrne 
papilloma virus, and polyoma, (Ridgeway, ,988; Baichwal and Sugden, ,986, and adenoviruses 
(Ridgeway, ,988; Baichwal end Sugden, ,9861 and ,deno;ass.ciated viruses. Retroviruses also are 
attractrve gene transfer vehicles ((Nicolas and Rubenstein, ,988; Temin, ,986) as are vaccna virus 
(Rrdgeway, ,9881 and adeno-associated virus (Ridgeway, 1988). Such vectors may bo used to 1,1 
transform cell lines M yi,ro for the purpose of expressing proteins of interest or (ii) to transform cells i„ 
v,lrc or ,„ vivo to provide therapeutic polypeptides in a gene therapy scenario. 

In some embodiments, the vecto, is HSV. Because HSV is neurotropic, i, has generated 
considerable interest in treatrng nervous system disorders. Since insulin-secreting pancreatic p-cells 
share many features with neurons, HSV may be useful for delivering genes to p-cells and for gene therapy 
Of diabetes. Moreover, the ability of HSV ,e establish latent infections in non-dividing neuronal cells 
without mtegrating into the host cell chromosome or otherwise altering the host cell s metabolism along 
with the existence of a promote, the, is active during latency. And though much attention has focused 
on the neurotropic applications of HSV, this vector also can be exploited for other tissues. 
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Another factor that makes HSV an attractive vector is the size and organization of the genome. 
Because HSV is large, incorporation of multiple genes or expression cassettes is less problematic than >n 
other smaller viral systems. In addition, the availability of different viral control sequences with varying 
performance (temporal, strength, etc.) makes it possible to control expression to a greater extent than in 
5 other systems. It also is an advantage that the virus has relatively few spliced messages, further easing 
genetic manipulations. 

HSV also is relatively easy to manipulate and can be grown to high titers. Thus, delivery is less 
of a problem, both in temis of volumes needed to attain sufficient MOI and in a lessened need for repeat 
dosings. 

10 F Encoded Proteins . u u 

Once the entire coding sequence of a marker-associated gene has been determmed. the gene can be 

inserted mto an appropriate expression system. The gene can be expressed in any number of different 
recombinant DNA expression systems to generate large amounts of the polypeptide product, which can then 
be purified and used to vaccinate animals to generate antisera with which further studies may be conducted. 
1 5 Examples of expression systems known to the skilled practitioner in the art include bacteria such as 

E. coli. yeast such as Saccharomycescerevisia and Pichia pastoris. baculovirus, and mammalian expression 
systems such as in COS or CHO cells. In one embodiment, polypeptides are expressed in E. coli and in 
baculovirus expression systems. A complete gene can be expressed or. alternatively, fragments of the gene 
encoding portions of polypeptide can be produced. 
20 In one embodiment, the gene sequence encoding the polypeptide is analyzed to detect putative 

transmembrane sequences. Such sequences are typically very hydrophobic and are readily detected by the 
use of standard sequence analysis software, such as MacVector |IB1, New Haven, CT). The presence of 
transmembrane sequences is often deleterious when a recombinant protein is synthesized in many 
expression systems, especially E. coli, as it leads to the production of insoluble aggregates that are difficult 
25 to renature into the native conformation of the protein. Deletion of transmembrane sequences typically does 
not significantly alter the conformation of the remaining protein structure. 

Moreover, transmembrane sequences, being by definition embedded within a membrane, are 
inaccessible. Therefore, antibodies to these sequences will not prove useful for in vivo or in situ studies. 
Deletion of transmembrane-encoding sequences from the genes used for expression can be achieved by 
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Standard techniques. For example, fortu.tously-placed restnct.on enzyme s.tes can be used to excse the 
des,red gene fragment, or PCR-type amplif icat.on can be used to amplify only the desired par, of the gene. 
The skilled practitioner will realize that such changes must be designed so as not to change the translat.onal 
reading frame for downstream portions of the proteinencodingsequence. 

In one embodiment, computer sequence analysis is used to determine the location of the predicted 
major antigenic determinant epitopes of the polypeptide. Software capable of carrying out this analysis ,s 
readily available commercially, for example MacVector (IBI, («ew Haven, CT). The software typically uses 
standard algorithms such as the Kyte/Doolittleor Hopp/Woods methods for locating hydrophilic sequences 
which are characteristically found on the surface of proteins and are, therefore, likely to act as antigenic 
determinants. 

Once this analysis is made, polypeptides can be prepared that contain a, least the essential features 
of the antigenic determinant and that can be employed in the generation of antisera against the polypeptide 
Minigenes or gene fusions encoding these determinants can be constructed and inserted into expression 
vectors by standardmethods, for example, using PCR methodology. 

The gene or gene fragment encoding a polypeptide can be inserted into an expression vector by 
standard subcloning techniques. In one embodiment, an £ coli expression vector is used that produces the 
recombinant polypeptide as a fusion protein, allowing rapid affinity purification of the protein. Examples of 
such fusion protein expression systems are the glutathione ^-transferase system (Pharmacia, Piscataway. 
NJ), the maltose binding protein system (NEB, Beverley, fWA), the FLAG system (161, New Haven, CT), and 
20 the 6xHis system (Qiagen, Chatsworth, CA). 

Some of these systems produce recombinant polypeptides bearing only a small number of additional 
amino acids, which are unlikely to affect the antigenic ability of the recombinant polypeptide. For example, 
both the FLAG system and the 6xHis system add only short sequences, both of that are known to be poorly 
antigenic and which do not adversely affect folding of the polypeptide to its native conformation. Other 
fusion systems produce polypeptide where it is desirable to excise the fusion partner from the desired 
polypeptide. In one embodiment, the fusion partner is linked to the recombinant polypeptide by a peptide 
sequence containing a specific recognition sequence for a protease. Examples of suitable sequences are 
those recognized by the Tobacco Etch Virus protease (Life Technologies, Gaithersburg, MD) or Factor Xa 
(New England Biolabs, Beverley, MA). 
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Recombinant bacterial cells, for example E. col,, are grown in any of a number of suitable media, for 
example LB, and the expression of the recombinant polypeptide induced by adding IPTG to the media or 
switching incubation to a higher temperature. After culturing the bacteria for a further period of between 2 
and 24 hours, the ceils are collected by centrlfugation and washed to remove residual media The bacterial 
5 cells are then lysed. for example, by disruption in a cell homogenizer and centrifuged to separate the dense 
inclusion bodies and cell membranes from the soluble cell components. This centrlfugation can be performed 
under conditions whereby the dense inclusion bodies are selectively enriched by incorporation of sugars such 
as sucrose into the buffer and centrifugationat a selective speed. 

In another embodiment, the expression system used is one driven by the baculovirus polyhedron 
10 promoter. The gene encoding the polypeptide can be manipulated by standard techniques in order to 
facilitate cloning into the baculovirus vector. One baculovirus vector is the pBlueBac vector (invitrogen, 
Sorrento. CA). The vector carrying the gene for the polypeptide is transfected into Spodoptera frugiperda 
(Sf9) cells by standard protocols, and the cells are cultured and processed to produce the recombinant 
antigen. See Summers et aL A MANUAL OF METHODS FOR BACULOVIRUS VECTORS AND INSECT CELL 
15 CULTURE PROCEDURES, Texas AgriculturalExperimentalStation. 

As an alternative to recombinant polypeptides, synthetic peptides corresponding to the antigenic 
determinants can be prepared. Such peptides are at least six amino acid residues long, and may contain up 
to approximately 35 residues, which is the approximate upper length limit of automated peptide synthesis 
machines, such as those available from Applied Biosystems (Foster City, C A). Use of such small peptides for 
20 vaccination typically requires conjugation of the peptide to an immunogenic carrier protein such as hepatitis 
B surface antigen, keyhole limpet hemocyanin or bovine serum albumin. Methods for performing this 
conjugation are well known in the art. 

In one embodiment, amino acid sequence variants of the polypeptide can be prepared. These may, 
for instance, be minor sequence variants of the polypeptide that arise due to natural variation within the 
25 population or they may be homologues found in other species. They also may be sequences that do not 
occur naturally but that are sufficiently similar that they function similarly andlor elicit an immune response 
that cross-reacts with natural forms of the polypeptide. Sequence variants can be prepared by standard 
methods of site-directed mutagenesis such as those described below in the following section. 
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Ammo acid sequence variants of the polypept.de can be substitutional, insert.onal or deletion 
variants. Deletion variants lack one or more residues of the native protein which are not essential for 
function or immunogenic activity, and are exemplified by the variants lacking a transmembrane sequence 
described above. Another common type of deletion variant is one lackmg secretory signal sequences or 
signal sequences directing a protein to bind to a particular part of a cell. An example of the latter sequence 
is the SH2 domain, which Induces protein binding to phosphotyrosine residues. 

Substitutional variants typically contain the exchange of one amino acid for another at one or more 
sites within the protein, and may be designed to modulate one or more properties of the polypeptide such as 
stability against proteolytic cleavage. Substitutions preferably are conservative, that is, one ammo acid is 
replaced with one of similar shape and charge. Conservative substitutions are well known m the art and 
include, for example, the changes of: alanme to serine; arg.nme to lysine; asparagine to glutam.ne or 
histidme; aspartate to glutamate; cysteme to serine; glutamine to asparagine; glutamate to aspartate; 
glycine to prolme; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or 
isoleuclne; lysine to arginine; methionine to leucine or isoleucine; phenylalanme to tyrosine, leucine or 
methionine; serme to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or 
phenylalanine; and valine to isoleucine or leucine. 

Insertional variants include fusion protems such as those used to allow rapid purification of the 
polypeptide and also can include hybrid proteins containing sequences from other protems and polypeptides 
which are homologues of the polypeptide. For example, an insertional variant could include portions of the 
amino acid sequence of the polypeptide from one species, together with portions of the homologous 
polypeptide from another species. Other insertional variants can include those in which additional amino 
acids are introduced witfiin the coding sequence of the polypeptide. These typically are smaller insertions 
than the fusion proteins described above and are introduced, for example, into a protease cleavage site. 

In one embodiment, major antigenic determinants of the polypeptide are identified by an empirical 
approach in which portions of the gene encoding the polypeptide are expressed in a recombinant host, and 
the resulting proteins tested for their ability to elicit an immune response. For example. PCR can be used to 
prepare a range of cDNAs encoding peptides lacking successively longer fragmems of the Cterminus of the 
protein. The immunoprotectiveactivity of each of these peptides then identifies those fragments or domains 
of the polypeptide that are essential for this activity. Further experiments in which only a small number of 
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amino acids are removed at each iteration then allows the location of the antigenic determinants of the 
polypeptide. 

Another embodiment for the preparation of the polypeptides according to the invention is the use of 
peptide mimetics. Mimetics are peptide containing molecules that mimic elements of protein secondary 
5 structure. See. for example. Johnson et al.r?.,m Turn Mimetics" in BIOTECHNOLOGY AND PHARMACY. 
Pezzuto et ai. Eds.. Chapman and Hall. New York ( 1 993). The underlying rationale behind the use of peptide 
mimetics is that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such a 
way as to facilitate molecular interactions, such as those of antibody and antigen. A peptide mimetic is 
expected to permit molecular interactions similar to the natural molecule. 
10 Successful applications of the peptide mimetic concept have thus far focused on mimetics of (3- 

turns within proteins, which are known to be highly antigenic. Likely p turn structure within an polypeptide 
can be predicted by computer-based algorithms as discussed above. Once the component amino acids of the 
turn are determined, peptide mimetics can be constructed to achieve a similar spatial orientation of the 
essential elements of the amino acid side chains. 
15 Modification and changes may he made in the structure of a gene and still obtain a functional 

molecule that encodes a protein or polypeptide with desirable characteristics. The followmg is a discussion 
based upon changing the amino acids of a protein to create an equivalent, or even an improved, second- 
generation molecule, the amino acid changes may be achieved by changing the codons of the DNA 
sequence, according to the following data. 
20 For example, certain amino acids may be substituted for other amino acids in a protein structure 

without appreciable loss of interactive binding capacity with structures such as, for example, antigen binding 
regions of antibodies or binding sites on substrate molecules. Since it is the interactive capacity and nature 
of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be 
made in a protein sequence, and its underlying DNA coding sequence, and nevertheless obtain a protein with 
25 like properties. It is thus contemplated by the inventors that various changes may be made in the DNA 
sequences of genes without appreciable loss of their biological utility or activity. 

In making such changes, the hydropathic index of amino acids may be considered. The importance of 
the hydropathic amino acid index in conferring interactive biologic function on a protein is generally 
understood in the art (Kyte & Doolittle, 1982). 
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„ i. accepted the, ,he relative hydropatte cbarace, of .he amino acid coo.nbotes .0 .he 
.econda-y s.rucore o. .he resui.aa. p...eio, which in .u,n define, .he ,n.erac,«n of .he p.o,e,n w,.h 
o,her »,ec„les, fo, example, enzymes, subs.ra.es. recep.ors. DNA. an.ibodies. an.i,ens. and .he „ke. 

Each amino acid has been assigned a hydropa.Kic inde. on .he basis of .hei, hydrophobicy and 
charge charac.eris.,os (Ky.e & Doo,i..le. 1982,. .hose are: Isoleucine ,.4.5,; valine M.2,: leocne 
,.3 8,- phenyl^anlne f.2.B,; cys.einelcys.ine (.2.5,; meff^onine ,.1.9,; afan.e ,.,.8,; ,lycine ,0.4 
.hreonine ,-0.7,; serine ,.0.8,; .ryp.ophan ,-0.9,; .yrosine (.1.3); proline ,-1.6); his.idine ,-3.2,; ,lu.=ma.e 
3 5,. giutamine ,-3.5); aspar.a.e ,-3.5,; asparegine (.3.5,; lysine ,-3.9,; and arginine ,-4.5,. 

„ is known in .he ar, .ha. cerrain amino acids may be subs.i.u,ed by o.her amino acids ha„ng a 
similar hydropa.h,c index or score and s.il, resul. in a pro.ein wi,h similar biological ac.i,i.y. .... s.dl 
ob.ain , biological foncionally e,uivalen. pro.ein. In making such changes, .he subs.i.u.ion of ammo 
acids Whose hydropa.hic indices are wi.hin .2 is preferred, .hose w.ch are wi.hin .1 are particularly 
preferred, and those within i0.5 are even more particularly preferred. 

„ is also undersrood in .he ar, .ha. .he subs.i.u.ion of like amino acids can be made effecvely 
on the basis of hydrophiliciry. U.S. Pa.en. 4,554,10., incorpora.ed herein by reference, s.a.es .ha. .he 
„ea.es, local average hydrophilici.y of a pro.ein, as governed by .he hydrophiliciry of i.s adiacent ammo 
acids, correlates »iith a biological property of the protein. 

As detailed in U.S. Pa.en. 4,554.10i; .he following hydrophilicity values have been ass.gned .0 
amino acid residues; arginine ,.3.0V, lysine ,.3.0,; asparta.e ,.3.0 . 1,; g:u.ama.e ,.3.0 . 1,; senne 
,.0 3,; asparagine (.0.2,; glu,amine f.0.2,; glyc^e ,0,; .hreonine ,0.4,; proline ,0.5 . .,; alanme 
,.0.5,; hisrldine -0.5,; cys.eine ,-1.0,; me.hion.e ,...3,; valine ,.1.5); leucine ,...8,; isofeucine ,-,.8,; 
.yrosine (-2.3,; phenylalanine ,-2.6l;.ryp.ophan ,-3.4,. 

„ is u„ders.ood .ha. an amino acid can be s„bs.i.u.ed for ano.her having a similar hydroph,l,c.y 
value and s.ill ob,ain a biologically eguivalen. and i™.nologica„y aguivaten. pro.ein. In such changes, .he 
subs.i,u.ion of an.no acids whose hydrophilici.y vatas are wi.hin .2 is preferred, .hose tha, are w,.h,n 
. , are particularly prefermd, and .hose wi.hin tO.S are even more parricularly preferred. 

As outiined above, amino acid substitutions are generally based on the relative similarly of the 
amino acid side-chain substituents, for example, .heir hydrophobieity, hydrophilicity, charge, size, and the 
like Exemplary subs.i.u.,ons .ha. .ake various of the foregoing characteristics in.o consideration are 
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well known to those of skill m the art and include: argm.ne and lysine; glutamate and aspartate; serme 
and threonine; glutamine and asparagme; and valine, leucine and isoleucine. 
G. Site Specific Mutagenesis 

Site-specific mutagenesis is a technique useful m the preparation of individual peptides, or 
biologically functional equivalent proteins or peptides, through specific mutagenesis of the underlying 
DNA. The technique further provides a ready ability to prepare and test sequence variants, incorporating 
one or more of the foregoing considerations, by imroducing one or more nucleotide sequence changes into 
the DNA. Site-specific mutagenesis allows the production of mutants through the use of specific 
oligonucleotide sequences which encode the DNA sequence of the desired mutation, as well as a 
sufficient number of adjacent nucleotides, to provide a primer sequence of sufficient size and sequence 
complexity to form a stable duplex on both sides of the deletion junction being traversed. Typically, a 
primer of about 17 to 25 nucleotides in length is preferred, with about 5 to 10 residues on both sides of 
the junction of the sequence being altered. 

In general, the technique of site-specific mutagenesis is well known in the art. As will be 
appreciated, the technique typically employs a bacteriophage vector that exists in both a single stranded 
and double stranded form. Typical vectors useful in site-directed mutagenesis include vectors such as the 
M13 phage. These phage vectors are commercially available and their use is generally well known to 
those skilled in the art. Double stranded plasmids are also routinely employed in site directed 
mutagenesis, which eliminates the step of transferring the gene of interest from a phage to a plasmid. 

In general, site-directed mutagenesis is performed by first obtaining a single-stranded vector, or 
melting of two strands of a double stranded vector which includes within its sequence a DNA sequence 
encoding the desired protein. An oligonucleotide primer bearing the desired mutated sequence is 
synthetically prepared. This primer is then annealed with the single-stranded DNA preparation, and 
subjected to DNA polymerizing enzymes such as E. coli polymerase I Klenow fragment, in order to 
complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed wherein one 
strand encodes the onginal non-mutated sequence and the second strand bears the desired mutation. 
This heteroduplex vector is then used to transform appropriate cells, such as £ coli cells, and clones are 
selected that include recombinant vectors bearing the mutated sequence arrangement. 
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The preparation of sequence variants of the selected gene using site directed mutagenesis is 
provided as a means of producing potentially useful species and is not meant to be limiting, as there are 
other ways in which sequence variants of genes may be obtained. For example, recombir, vectors 
encoding the desired gene may be treated with mutagenic agents, such as hydroxylamine, to obtain 
5 sequence variants. 

H. Expression and Purification of Encoded Proteins 
/. Expression of Proteins from Cloned cDNAs 

The cDNA species specified in SEQ ID N0:1, SEQ ID N0:3, SEQ ID N0:5, SEQ ID N0:7, and 
HNFla can be expressed as encoded peptides or proteins. In other embodiments cDNA species specified 
10 in SEQ ID NQ:78. SEQ ID N0:34, SEQ ID NG:36. SEQ ID N0:38, SEQ ID N0:40, SEQ ID l\IG,42. SEQ ID 
ND:44, SEQ ID N0:46, SEQ ID N0:48, SEQ ID N0:50, SEQ ID N0:52, SEQ ID N0:54, and HNF4a can be 
expressed as encoded peptides or proteins. The DNA species specified in SEQ ID N0:128 and HNFip can 
be expressed as encoded peptides or proteins. The engineering of DNA segmentlsl for expression in a 
prokaryotic or eukaryotic system may be performed by techniques generally known to those of skill in 
15 recombinant expression. It is believed that virtually any expression system may be employed in the 
expression of the claimed nucleic acid sequences. 

Both cDNA and genomic sequences are suitable for eukaryotic expression, as the host cell will 
generally process the genomic transcripts to yield functional mRNA for translation into protein. Generally 
speaking, it may be more convenient to employ as the recombinant gene a cDNA version of the gene. It is 
20 believed that the use of a cDNA version will provide advantages in that the size of the gene will generally 
be much smaller and more readily employed to transfect the targeted cell than will a genomic gene, which 
will typically be up to an order of magnitude larger than the cDNA gene. However, the inventor does not 
exclude the possibility of employing a genomic version of a particular gene where desired. 

As used herein, the terms "engineered" and "recombinant" cells are intended to refer to a cell into 
25 which an exogenous DNA segment or gene, such as a cDNA or gene has been introduced. Therefore, 
engineered cells are distinguishable from naturally occurring cells which do not contain a recombinantly 
introduced exogenous DNA segment or gene. Engineered cells are thus cells having a gene or genes 
introduced through the hand of man. Recombinant cells include those having an introduced cDNA or 
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genomic DNA, and also include genes positioned ad)acent to a promoter not naturally associated with the 
particular introduced gene. 

To express a recombinant encoded protein or peptide, whether mutant or wild-type, in accordance 
with the present invention one would prepare an expression vector that comprises one of the claimed 
isolated nucleic acids under the control of one or more promoters. To bring a coding sequence "under the 
control of" a promoter, one positions the 5' end of the translational initiation site of the reading frame 
generally between about 1 and 50 nucleotides "downstream" of [i.e.. 3' of) the chosen promoter. The 
"upstream" promoter stimulates transcription of the inserted DNA and promotes expression of the 
encoded recombinant protein. This is the meaning of "recombinant expression" in the context used here. 

Many standard techniques are available to construct expression vectors containing the 
appropriate nucleic acids and transcriptional/translational control sequences in order to achieve protein or 
peptide expression in a variety of host-expression systems. Cell types available for expression include, 
but are not limited to, bacteria, such as £ c^;//and B. subtilis transfomied with recombinant phage DNA, 
plasmid DNA or cosmid DNA expression vectors. 

Certain examples of prokaryotic hosts are £ coli strain RR1, £ coli 11322, £ cofiB. £ co/i x 
1776 (ATCC No. 31537) as well as £ co/im^W {F-, lambda-, prototrophic. ATCC No. 273325); bacilli 
such as Baci/fas subtilis; and other enterobacteriaceae such as Salmonella typhimurium. Serratia 
marcescens, and various Pseudomonas species. 

In general, plasmid vectors containing replicon and control sequences that are derived from 
species compatible with the host cell are used in connection with these hosts. The vector ordinarily 
carries a replication site, as well as marking sequences that are capable of providing phenotypic selection 
in transformed cells. For example, £ coli^z often transformed using pBR322, a plasmid derived from an 
E. coli species. Plasmid pBR322 contains genes for ampicillin and tetracycline resistance and thus 
provides easy means for identifying transformed cells. The pBR322 plasmid, or other microbial plasmid or 
phage must also contain, or be modified to contain, promoters that can be used by the microbial organism 
for expression of its own proteins. 

In addition, phage vectors containing replicon and control sequences that are compatible with the 
host microorganism can be used as transfomiing vectors in connection with these hosts. For example. 
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expression vector 3' of the sequence des.red to be expressed to p v 

, '"™"ter s.i.ab,e pron,o.e,s, wbicb ba.e .be addi.iona, advantage „ansc,ip.ion c„n.ro«ed ^ 
" :ndi.,ons, incpde p.o...e. region .or aic.bo, deb.ro.a. ^^^^^^^ 

p.ospba,ase, degrada..e enzv^es assoCed wi.b - '^..e and ga,ac.ose 

,Wce,a,debvde.3.pbospba,o dehydrogenase, and en^y^es responsrbie ,o, maLose 
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In .ddilion ,0 m,c,o-o,g,„isms, c.l,„,es oi cells denveO from ,nul„cellul., o-gamsm. may also be 
used as hosts. I„ p,i„dple, a„v such cell c.l.ure ,s workable, whe.he, Iron, veMebra.s o, ,„ve„eb,ate 
culture. ,n add,.,o„ ,o mammalian cells, these include insec, cell systems ,„fec.ed w„h recomb.nen, .„„s 
e«p.ess,on .ec.cs (..,., baculovi,.sl; and plan, cell systems infected with recambmant .i,„s expression 
voters (e.,., cauliflower mosaic .ir„s, CaMV; tobacco mosarc .iros, TMVI or transformed with 
recombrnan. plasmid expression vectors M., Ti plasmidi containing one o, more coding sequences. 

In a useful insect system, ^amsp/, califmka nuclear polyhidros.s virus (AcNPVl is used as a 
vector to express foreign genes. The virus grows h, S,o,op,e„ Iragiperde cells. The isoletad nucleic 
acid coding seguences a,e cloned into non-essential regions (fo, example the polyhedron gene) of the virus 
and placed under control of an AcNPV promoter ,,or example, the polyhedron promoter,. Successful 
msertron of the coding sequences results in the inactivation of the polyhedron gene and production of non- 
nccluded recombinant virus U... virus lacking the proteinaceous coat coded fo, by the polyhedron genel 
These recombinant viruses are then used to infect u,,.,,,,, eells ,n which the inserted gene 

IS expressed [e.g., U.S. Patent No. 4,215,0511. 

Examples of useful mammalian host cell lines are VERO and HeLa cells, Chinese hamster ovary 
(CHO, eel, lines, WI38, BHK, 003 7, 293, HepG2, N,H3T3, R,N and MDCK cell lines. In addi.ion a host 
cell may he chosen the, modulates the expression of the inserted seguences, or modifies and processes 
.^3 gene product in the specific fashion desired. Such modifications glycosvlation, and processing 
\e.g.. deevagel of protein products may be important for the function of the encoded protein. 

Different host cells have characteristic and specific mechanisms for the post-transletiona, 
processing and modification of proteins. Appropriate ceU lines ., host systems can be chosen to ensure 
the correct modification and processing of the foreign protein expressed. Expression vectors for use m 
mammalian cells ordinarily include an origin of replication las necessary,, a promoter located in front of 
the gene to be expressed, along with any necessary ribosome binding sites, RN* splice sites 
polyadenylation site, and transcnptional terminator seguences. The origin of replication may be provided 
either b, construction of the vector to include an exogenous origin, such as may be derived from SV40 or 
other Viral ,e.,.. Polyoma, Aden., VSV, BPV, source, o, may be provided by the host cell chromosomal 
repl-cation mechanism. If the vector rs mtegrated itto the host cell chromosome, the latter is often 
sufficient. 



BNSDOCID <WO__981 1254A1_I_> 



10 



15 



PCTAJS97/16037 

WO 98/1 1254 

55 

The pronce. may b. denved .,o. *a genome o. n„m™l.an cell. ie.,. ™.a.o*i.nein 
p,omo,e,l 0, r,a.™Uan viruses .l-e adenovirus la,e promote,; .he vaccWa .,rus 7.5K 
prcoterl. Fonher, is also possible, and may be desirable, ,o utilize promoter ., control sequences 
uormallv associated with the desired gene se,uence, provided such control sequences are compafble w,th 
the host cell systems. 

A number of viral based expression systems may be utilized, for example, commonly used 
promoters are derived from polyoma. Adenovirus 2. cytomegalovirus and Simian Virus 40 (SV40I. The 
early and late promoters of SV40 virus are useful because both are obtained easily from the vtnrs as a 
„e,„«„t which also contains the SV40 viral origin of replication. Smaller or lar^r SV40 fragments ma, 
also be used, provided there is included the approxrmately 250 bp se,uence e.tendin, from the //»DIII 
site toward the Bg^ site located in the viral origin ol replication. 

,„ cases Where an adenovirus is used as an expression vector, the coding se,uences may be 
hgsted to a„ adenovrrus transcription/translation control comptex, e.,., the late promoter and .r,pan,.e 
leeder seguence. This chimeric gene may then be inserted in the adenovirus genome by or * ... 
recombination. Insertion in a non-essential regron of the viral genome regron E, o, E3, will result m 
a recombinant virus that is viable and capable of expressing proteins in infected hosts. 

specific tnitiation signals may also be reguired for efficient translation of the claimed isolated 
nucleic acid coding sequences. These signals include the ATG initiation codon and adiacem sequences. 
Exogenous transla.ional control signals, deluding the ATG initiation codon. may addittonally need to be 
provided, one of ordinary sKill in the art would read.y be capable of determining this need and pro.,d,ng 
the necessary signals. It is well known that the initiation codon must be in-f,ame (or in-phase, With .he 
reading frame of the desired coding sequence to ensure translation of the entire insert. These exogenous 
..anslatrona, comrol signals and initiation codons can be of a vanety of origins, both natural and 
synthetic. The efficiency of expression may be enhanced by the inclusion of appropria.e transcptton 
; enhancer elements or transcription terminators (Bittner M a/.. 1 9871. 

,n eukaryotic expression, one will also typically desire to incorporate into the transcriptional umt 
,„ appropnate polyadenylation site ie.,.. S'-AATAAATI if one was no. contatned within .he orrgmal 
cloned segmen.. Typically, the poly A addition s,te is placed about 30 to 2000 nucleotides "downstream 
of the termination site of the protein at a pos.tion p„or to transcription te.m,nat,on. 
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For long-term, high yield production of recombinant proteins, stable expression is preferred. For 
example, cell lines that stably express constructs encoding proteins may be engineered. Rather than 
using expression vectors that contain viral origins of replication, host cells can be transformed with 
vectors controlled by appropriate expression control elements (..^.. promoter, enhancer, sequences, 
transcr,pt,on terminators, polyadenylation sites. and a selectable marker. Following the introduction 
of foreign DNA. engineered cells may be allowed to grow for 1-2 days m an enriched medium, and then 
are switched to a selective medium. The selectable marker in the recombinant piasmid confers resistance 
to the selection and allows cells to stably integrate the piasmid mto their chromosomes and grow to form 
foci, which in turn can be cloned and expanded into cell lines. 

A number of selection systems may be used, including, but not limited, to the herpes simplex virus 
thymidine kinase (Wigler et al.. 1977), hypoxanthine-guanine phosphoribosyltransferase (Szybalska et aL. 
1962) and adenine phosphoribosyltransferase genes ILowy et al.. 1980). in tk. hgprt or aprt cells, 
respectively. Also, antimetabolite resistance can be used as the basis of selection for dhfr. which confers 
resistance to methotrexate (Wigler et al.. 1980; O'Hare et al.. 1981); gpt. which confers resistance to 
mycophenolic acid (Mulligan et al.. 1981); neo. which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin et al., 1981); ^n^hygro. which confers resistance to hygromycin. 

It is contemplated that the isolated nucleic acids of the invention may be "overexpressed", i.e.. 
expressed in increased levels relative to its natural expression in human cells, or even relative to the 
expression of other proteins in the recombinant host cell. Such overexpression may be assessed by a 
variety of methods, including radio-labeling and/or protein purification. However, simple and direct 
methods are preferred, for example, those involving SDS/PAGE and protein staining or western blotting, 
followed by quantitative analyses, such as densitometric scanning of the resultant gel or blot. A specific 
increase in the level of the recombinant protein or peptide in comparison to the level in natural human 
cells is indicative of overexpression, as is a relative abundance of the specific protein in relation to the 
25 other proteins produced by the host cell and, e.g.. visible on a gel. 
2. Purification of Expressed Proteins 

Further aspects of the present invention concern the purification, and m particular embodiments, 
the substantial purification, of an encoded protein or peptide. The tern, "purified protein or peptide " as 
used herein, ,s intended to refer to a composition, isolatable from other components, wherein the protein 
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„ p.p,.e is pu,«.d ,0 any deg.e ,e,a.,ve ,o iu „a.u,a„v-ob,=,-,ab,. s.a.e, i.e.. ,„ ... ,e,a„ve ,o 
p„i„ wl,.a a hepa.acv,e o, p-cel, ex.raC. * M^d p-o.ein or pept,de ,hers,or= aUa rate. ,a a 
p,„,sin peptide, f,ae t,.n the environment in which ,t may naturally occor. 

Generally, "purif^d- will refer to a protein o, peptide composition tha, has heen suh,ac,ed to 
.ractionation to remove various other components, and which composition sohs.a„tia,,y .e.a.ns ,ts 
expressed biological activity. Where the term "sohstantially purified" is used, this des.snatron w,ll refer 

,„ e oompositio ch the protein or peptide forms the major component of ,he composit.on, such as 

constituting about B0% or more of the proteins in the composition. 

various methods for ,uanti,ying the degree of purification of the protein or peptide w,.l be known 
,0 those 0, s«> in the art in li^ht of the present disclosure. These include, for example, determin^g . e 
specific activity of an active fraction, or assess^ the number of polypeptides wi.hrn a fractron y 
SOS.PAGE analysis. A preferred method for assessing the purity of a fraction is to calculate the specrfrc 

.,e degree o, purity, herein assessed by a --fold purificat.n number". Tha actual unrts used to represent 

,he purification and whether or no, the expressed protein or peptide exhibits a detectable acvrty. 

Var,ous .echni,ues s^table for use . protein purlfrcatlon w,« be well .nown to those of s.ll ,n 
,He art These include, for example, precipitatron with ammor^um sulphate, polyethylene glycol 
«,.ibodies and the like or by heat denatura.l.n, .o«owed by centHfugatlo. chromatography steps such as 
ion excnange, ge, filtration, reverse phase, hydroxylapatlte ar. affin^y chromatography, rsoelectnc 
focusing: gel electrophoresis: and combinations of such and other techniques. As is generally nown m 
,he art it Is believed tha, the order of conducting the various purification steps may be changed, or that 
certain steps may be omitted, and still result In a suitable method for the preparation of a substantrall, 

purified protein or peptide. 

There is no general re,uirement ,hat the protein or peptide always he provided m therr most 
purified state. Indeed, i. Is contemplated that less substantially purified products will have utlhty ,n 
certain embodiments. Partial purification may be accomplished by using fewer purification steps m 
combination, or by utizing differem forms of the sa™ general purification scheme. For example, ,t ,s 
appreciated that a catlon exchange column chromatography perforr^ed util,™, an HPLC apparatus w,ll 



BNSDOCID- <W0 98112S4A1_L> 



wo 98/11254 

PCT/US97/16037 

58 

generally result .n a greater -fold purification than the same technique utilizing a low pressure 
chromatography system. Methods exhibiting a lower degree of relative purification may have advantages 
in total recovery of protein product, or in maintaining the activity of an expressed protein. 

It is known that the migration of a polypeptide can vary, sometimes significantly, with different 
conditions of SDS/PAGE (Capaldi et al., Biochm. Biophys. Res. Comm., 76:^. 1977). It will therefore 
be appreciated that under differing electrophoresis conditions, the apparent molecular weights of punf.ed 
or partially purified expression products may vary. 

I. Preparation of Antibodies Specific for Encoded Proteins 
Antibody Generation 

For some embodiments, it will be desired to produce antibodies that bind with high specificity to 
the protein product(s) of an isolated nucleic acid selected from the group comprising SEQ ID N0:1. SEQ ID 
N0:3, SEQ ID ND:5, SEQ ID N0:7 or any other mutant of HfJFla, SEQ ID fy0:78, SEQ ID N0:34, SEQ ID 
N0:36, SEQ ID N0:38. SEQ ID N0:40, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:46. SEQ ID N0:48, SEQ ID 
N0:50, SEQ ID N0:52, SEQ ID M0:54, or any other mutant of HNF4a, SEQ ID N0:128 (HNFip) or any 
mutant of HNF1 p. Means for preparing and characterizing antibodies are well known inthe art (See, e.g.. 
Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, 1988; incorporated herein by 
reference). 

Methods for generating polyclonal antibodies are well known in the art. Briefly, a polyclonal 
antibody is prepared by immunizing an animal with an antigenic composition and collecting antisera from 
that immunized animal. A wide range of animal species can be used for the production of antisera. 
Typically the animal used for production of antisera is a rabbit, a mouse, a rat, a hamster, a guinea pig or 
a goat. Because of the relatively large blood volume of rabbits, a rabbit is a preferred choice for 
production of polyclonal antibodies. 

As is well known in the art, a given composition may vary in its immunogenicity. It is often 
necessary therefore to boost the host immune system, as may be achieved by coupling a peptide or 
polypeptide immunogen to a carrier. Exemplary and preferred carriers are keyhole limpet hemocyanin 
(KLHI and bovme serum albumin IBSA). Other albumins such as ovalbumin, mouse serum albumin or rabbit 
serum albumin can also be used as carriers. Means for conjugating a polypeptide to a carrier protein are 
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wen Known in and .Cuds ,.,a,a,deh,de, .^la-cben.ovl.N Wd-o.vsuccnWde as,.. 

carbodiimide and bis biazotized benzidine. 

AS is a,s» wall kn.«n In ,ha a,., .he i™u„o,en,c,., a pa„icula, i^n-unajan con,p»s,.,„n can 
,e enhanced hv .he use o, nen-speolHc s.in,ula,o« o, ,ha i.»ne ,esponss, .na«n as ad,ava.. 
E«.p.a„ and p,e,er,ed ad|n,anu ino«e c.n.p.e.e freand s adiuvan, ,a non spaoihc s..n...o. 
,™,„„a response con.ainin, «ed ^K.».-« — ^"-''^ 

aluminum hydroxide adjuvant. 

The an,o.n. o. «n ccpasitian used in ,ha p.odoCion o, po.vclon. an„bod,es .anas 
„p„„ „e na»a o, .he i.n.no,an as we,, as .ha anin,a, osad ,o. i^nun.a.ion. A .„ia.v o. ,oo.as can 
, : used ,0 adn,i™s,a, ,ha in,n.„„o,an ,s.hco..ao„s, in..a™sc„,a. in„ada™al, ,n.,.anaos ,^ 
' i„.,ape.i,onaal,. The p.oduc.ion o, polyolona, an.Was n.a, he n,oni,o.ed hy sa.p„n. hlood . 
.™„i.d anin,a, a, va-ioos pain.s .ollowin, i— a.ian, A second, ^--' '"'"'"^ ^ ^ 
,.en The process a, hoostin, and .i.eria, is rapaa.ad an,il a soi.ahle .Ke- is ach,eved. W en d^s„a 
eva, 0. i.n,uno,anici„ is oh.ained, .he i.n,nni:ad anin,a, can he hied and ,he serun, isola.ad and s.o,ed 
. 1 in so™ cases ,he ani.. can he used .o .ne,a.a MAhs. p.odoc.ion o, -ahhn po^ 
...odies, ,ha anin,a, can he hied .h,o„,h an aa, ..n o, al.=.a.i.e,v hy ca.diac ponc.o,e. T e ,e» 
, „ewed ,0 c.a.la.a and .hen can..«o.ed ,o sepa.a.a sa™ cocnponen.s .,o. who c* 
.,oodclo.s.Thesa™n,.ayheosadas,s,o,.,„oosap.ca.i.ns.Mhedesi.edan.,hod, ac,,on.^^^^^^ 

p„ri.ied hy v.e„.kn.wn n,e,h.ds, such as a«ini,y ch,o.a..araphy usin, ano.he, an.,hody a pep.ida 

70 bound to a solid matrix. . 

MonoCona, an..odias ,.Ahs, .ay ha ,eadily p.apa,ed ..ou,h use o, wel,.nown .echn„u as, 
.„ch as ,h„se a.en,pli.iad in U.S. Pa.an. 4.,96,265, incorpa.a.ed herein by .e.e,ence. Typically, .h,s 
,echni,ue invol.as i™unizin, a s...h,a ani™, wi,h a seleced i.n,unogen con,posi.io„, ..... a punf«d 
„, pa,.ia„y purified expressed prCein, polypep.ide o, peplide. The im.unizin, con,pos,.,on ,s 
25 adminislcred in a manner ,ha. eHeCivaly stimulates antrbod, producns cells. 

The methods tor .enerating manoConal amibodies (MAbs, ,anerally begin along the same Irnes as 

„e use 0. rabbi,, sheep or frog cells Is also possible The use o, rats may pro.ide certain advantages 
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(Godmg, 1986, pp. 60-6 1). but mice are preferred. ^,th the BALB/c mouse be.ng most preferred as th.s ,s 
most routmeiy used and generally gives a higher percentage of stable fusions. 

The animals are injected with antigen as described above. The antigen may be coupled to earner 
molecules such as keyhole l.mpet hemocyan.n if necessary. The antigen would typically be mixed with 
adjuvant, such as Freund's complete or mcomplete adjuvant Booster in,ect,ons with the same antigen 
would occur at approximately two-week mtervals. 

Following immunization, somatic cells with the potential for producing antibod.es, specifically B 
lymphocytes (B cells), are selected for use in the MAb generating protocol. These cells may be obtained 
from biopsied spleens, tonsils or lymph nodes, or from a peripheral blood sample. Spleen cells and 
peripheral blood cells are preferred, the former because they are a rich source of antibody-producing cells 
that are in the dividing plasmablast stage, and the latter because peripheral blood is easily accessible. 
Often, a panel of an.mals w.ll have been immunized and the spleen of animal with the highest antibody 
titer will be removed and the spleen lymphocytes obtained by homogenizing the spleen with a syringe. 
Typically, a spleen from an immunized mouse contains approximately 5 X 10' to 2 X lO' lymphocytes. 

The antibody-producing B lymphocytes from the immunized animal are then fused with ceils of an 
immortal myeloma cell, generally one of the same species as the animal that was immunized. Myeloma 
cell lines suited for use in hybridoma producing fusion procedures preferably are non-antibodyproducing, 
have high fusion efficiency, and have enzyme deficiencies that render them incapable of growing in 
certain selective media that support the growth of only the desired fused cells (hybridomas). 

Any one of a number of myeloma cells may be used, as are known to those of skill in the art 
(Goding. pp. 65-66. 1986; Campbell, pp. 75-83. 1984). For example, where the immunized animal is a 
mouse, one may use P3.XB3/Ag8, X63-Ag8.653, NSI/l.Ag 4 1. Sp210.Agl4, FO. NSO/U. MPC-11, 
MPCn-X45.GTG 1.7 and S194/5XX0 Bui; for rats, one may use R210.RCY3. Y3-Ag 1.2.3. IR983F and 
4B210; and 0-265. 6M1500-GRG2. LICR-L0N.HMy2 and 00729-6 are all useful m connection with 
25 human cell fusions. 

One preferred murine myeloma cell is the NS-l myeloma cell line (also termed P3-NS-1-Ag4-1). 
which is readily available from the NIGMS Human Genetic Mutant Cell Repository by requesting cell line 
repository number GM3573. Another mouse myeloma cell l.ne that may be used is the 
B azaguanine resistant mouse murine myeloma SP2/0 non producer cell line. 
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„e..ods ,e.,a.,n, hv.i.s c< anUMvP-odacn, spleen Iv.ph node cells and .vein™ 
„3.llv ..P.se -n, so™.,c cells wi. .yelo™ cells ,n a .1 p.opn.on, .... ^ P™PO- 

L: .a. p,o™,e ... a, cell ™..aes. .s,on ^ 
.escnted by Knhle, and Mils.ein ,,975; ,976), and .hose „sin, polye.Mene ,lvccl .PEG s-ch 3 

I Ge,.e, « . ,1377,. T. .e o, elecncallv Indeed .slop ...nds is ,1s. app,.p.,a.e 

;:::;r;s - p-ad^e ... ...s a. a.. , x , x 

„.J..l..e,.ncvd.esno.p.se,p..«e.,as.e.a.le,.PSedh,«sa.ed.^^^^^^ 
~ a, un.sed cells ,pa.*lv .Ke pn« .,elon,a cells .a, -d ...allv c.n.»e d,„d 
c.,„ ,n a se,e..e .ed,.. ..... .ed,™ is .ne.„^.e .a. ..^^^^^^^^^ 

a,e„, .ha, blacks .be * svn,besis nucle..ides in .be .i.sua .al,u.e n«d,a. Exe.p. V 
I a,en.s a. a.in.p.e.in, ™.b.„e».e, and a.ascine. ..in.p.e.n and n,e.b...e.a,e bloc.J 
1 vn.li^ 0, b..b p.ines and p,nn,idines. wbe.aas a.asenne bl.c.s .nly purine syn.bes. Wbere 
:::enn..e..,a.eisad,,be._^^^ 
source of nucleotides (HAT medium). Where azasenne 

bypoxanthine. ^ .^^^ 33„3g, 

The preferred selection medium is HAT. Only P . „ „f thP 

pathways are du.c survive. 

formed from myeloma and B cells. 

TNs Cunn, p,..ides a p.pula.i.n a. bvbnd.n,as .... wbicb spec,„c b,b„do.as a,e selec d^ 
3 TvpicalW, selecian a, byb*.as is peH.^d by cul,a,in, ,be cells by sin^lclane ^-^^'^^J^'^ 
pis iplLwed by ,es,in, individual clana. s„pen,a.an.s ,afte, abou. .w. .. .b.ae wea.s, 
:::: assay sbpald be se™.i,e. simple and ,apid, sacb as —says, en:y.e 

i^unaassays, cy,.,..ic..y assays, pla,ue assays, do, i™..n.bindina assays, and ,be hke. 
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The selected hybridomas would then be senally diluted and cloned mtc md.vidual 
anubody.producmg cell l.nes. wh.ch can then be propagated .ndefmitely to provide MAbs. The cell lines 
may be exploited for MAb production in two basic ways. A sample of the hybndoma can be in.ected 
(often into the peritoneal cavity) into a histocompatible animal of tf,e type that was used to provide the 
somatic and myeloma cells for the original fusion. The injected animal develops tumors secreting the 
specific monoclonal antibody produced by the fused cell hybrid. The body fluids of the animal, such as 
serum or ascites fluid, can then be tapped to provide MAbs m high concentration. The individual cell lines 
could also be cultured mo. where the MAbs are naturally secreted into the culture medium from which 
they can be readily obtained in high concentrations. MAbs produced by either means may be furtner 
purified, if desired, using filtration, centrifugation and various chromatographic methods such as HPLC or 
affinity chromatography. 

Large amoums of the monoclonal antibodies of the presen, in.en.ion may also be ob.amed by 
™luply,ng hybridoma cells „ Cell clones a,e in|ec,ed ,„,„ mammals Iha, are his.ocompa.ible .i,b 
the parent cells, e.g.. syngeneic mice, to caose gtowtb of aottbody-producing tumors, Opt.onally the 
an,mals are primed with = hydrocarbon, especially oils s.ch as pristane l.etramethylpentadecane, pno, to 
injection. 

In accordance with the present in.entioo, fragments of the monoclonal amibody of the mven.ion 
cao be obtained from the monoclonal antibody produced as described aboye, by methods rtch include 
drgestton with enzymes s-ch as pepsin or papam and/or claa.age of disulfide bonds by chemical reduction 
Alternatively, monoclonal antibody fragments encompassed by the present in.ention can be synthesized 
.smg an automated peptide synthesizer, or by expression of full-length gene o, of gene fragments in £ 
coli. 

The monoclonal conjugates of the present invention are prepared by methods known in the art 
e.g.. by reacting a monoclonal antibody prepared as descr.bed above with, fo, instance, an enzyme ,n the 
presence of a coupling agent such as glutaraldahyde or periodate. Conjugates with fluorescetn markers 
are prepared in the presence of these coupling agents or by reacon with an isothiocyanate. Conjugates 
w..h metal chelates are similatly produced. Other moieties which antibodies may be con„a,ed include 
radionuclides such as 'h, -|, -| -s. "c. ^C, -CI. "Co, ='Co, "Pe, «Se, ^Eo and ""To ate 
other useful labels that can be con|ugated to antibodies Radioactively labeled monoclooal antibod.s of 
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incubating pertechnate, a reducing agent such as SNCl, a 

It will oe appiBUd u Mnnv^ MQDY4 and WIODYl will have 

. HNFla mm or HNF4a (for proteins that are mutated in M0DY3, M0DY4. 
for HNFla, HNFIB ^.^ ^^^^^ ,33 ,„ 

:;:::::::»=.=.-"■•-'--■--■• 

that such uses are within the scope of the present invenfon. 

J. Immunodetection Assavs ^^^^ the diagnosis of 

The immunodetection methods ot the presem 
_ . .OOU .00. . .00. , ^^^^^^^^^^ 

,„„e„ 0, a.*o.v sa..e. in *e .lection o. --V^*--; J^'' ^, 

in the clinical <iiagno«s "'P'"''"'^™'*"""^^^ ,. ,„ 

. . ,„ HNFla noclaic acid. HNF4a nuclsic acid, HNFip nucle>c acd, o, an 
„, ,„ antisan encoded by an HNFla n„cl«c ^, ^^^^^^^ ^^^^^ 

in the levels o, snc. an aa.„on, ,n co.,a„.n . - ^^^^^ 
diagnostic methods lies, pa ^^^^^^^ 

,..e c, « in tHe a. ate ve,v .a.iliat with diHe— het.een si.i,,can, e.p,ess,on 
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biomarker. Indeed, background expression levels are often used to form a "cut off above wh.ch 
mcreased sta.nmg will be scored as s,gn,f.cant or positive. S.gmf.cant expression may be represented by 
h.gh levels of ant.gens .n tissues or withm body flu.ds, or alternatively, by a h.gh proportion of ceils from 
within a tissue tfiat each give a positive signal. 
1- Immunodetection Methods 

In still further embodiments, the present invention concerns immunodetection methods for 
bmdmg, purifying, removing, quantifying or otherwise generally detecting biological components. The 
encoded proteins or peptides of the present invention may be employed to detect antibodies having 
reactivity therewith, or. alternatively, antibodies prepared in accordance with the present invention may 
be employed to detect the encoded proteins or peptides. The steps of various useful immunodetection 
methods have been described in the scientific literature, such as, e.g.. Nakamura et al. (1987). 

In general, the immunobinding methods include obtaining a sample suspected of containing a 
protein, peptide or antibody, and contacting the sample with an antibody or protein or peptide m 
accordance with the present invention, as the case may be. under conditions effective to allow the 
1 5 formation of immunocomplexes. 

The immunobinding methods include methods for detecting or quantifying the amount of a 
reactive component in a sample, which methods require the detection or quantitation of any immune 
complexes formed during the binding process. Here, one would obtam a sample suspected of containing a 
HNFla or HNF4a mutant encoded protein, peptide or a corresponding antibody, and contact the sample 
with an antibody or encoded protein or peptide, as the case may be. and then detect or quantify the 
amount of immune complexes formed under the specific conditions. 

In terms of antigen detection, the biological sample analyzed may be any sample that is suspected 
of containing a HNFla. HWFip or HNF4a antigen, such as a pancreatic p-cell, a homogenized tissue 
extract, an isolated cell, a cell membrane preparation, separated or purified forms of any of the above 
protein-containing compositions, or even any biological fluid that comes into contact with diabetic tissue, 
including blood. 

Contacting the chosen biological sample with the protein, peptide or antibody under conditions 
effective and for a period of time sufficient to allow the formation of immune complexes (primary immune 
complexes) is generally a matter of simply adding the composition to the sample and incubating the 
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,0, a period „ms Ion, encgh ,0, .he .o,*od,es >o io,. im^ne comple.e. with, ... .o b,nd 
,m„eo. presen,. Ai.e, ,l.s „™, the s™ple-a„.,body co.p..i.ion, such as a ,.ssoe sec.,.. 
EUSA p,a.e, do. hlo, o, «es,era blc, w«l 9ene,.ll, he washed .o ,en,ove any non-specilica,,, hound 
,„,hodv specBS, allowin, only .hose an.ihodies specifically bound wi,h,n .he pnn,a,y i^une co.ple.es 
to be detected, 

,„ general, .he de.ec.ion of in,n,un.c».ple. fon^.ion is well known in .he =r, and may 
ac.e,ed .h-ough .be applica.ion o. nun«,oos ap^oaches. These .e.hods a,e generally based upon , he 
de,ec.i.n o. a label or .arker, such as any radioacive, llucesceo., h.olopical o, en,y,na„c .aps o, labe. 

s„„da,d use in .he a«. U.S. Pa,en,s concembg .he use ot such labels incl.de 3,817,837; 3,850,752, 
3 939 360 3 996,345; 4,277,-137; 4,275,149 and 4,366,24,, each inco,po.a.ed herein by reference. 
Of course, one may find addi.iona. advan.a,es .hr.ugh .he use o. a secondary bmdia, ligand such as a 
second antibody or a bio.inla.idin ligand binding arrargemen., as is known in .he ar.. 

The encoded pro.ein, pep.ide or corresponding 3n.ibody employed in ,he de.ec.ion may ,.sel. he 
«ed .0 a de.ec,ah,e label, wherein one would .hen simply de.ec, .his label, .hereby allowing .he amoun. 
of .he primary immune complexes in the composilioo lo be de.emiined. 

A„e,na,ively, .he firs, added componen, .ha. becomes bound w,.hin .he primary immune 
complexes may be de.eced by means of a second hind., ligand .ha. has hind., a..ini.y for .he encoded 
pro.ein, pep.ide or corresponding an,ibody. In .hese cases, .he second binding ligand may be linked ,o a 
de.ee.ab,e label. The second bind., .,»,d is ,.self .f.e„ an amihody, w.ch .ay thus be .ermed a 

iigand, or an.ibody, under conduions elfecive and for a period of .ime sufficien. .o allow .he forma.roo of 
secondary immune complexes. The secondary in^ne complexes are .ben generally washed ,o remove 
any non.speci.ically bound labeled secondary an.ib.dies o, ligands, and .he remaining label m .he 
secondary immune complexes is .hen de.eced. 
; Fur,he. me.hods include .he de.ec.ion of primary immune complexes by a .wo s.ep approach, 

second binding l^gand, such as a» an,ibody, ,ha, has bind., aHini.y for .he encoded pro.ein, pep.rde or 
corresponding an,ibody is used ,. form secondary immune complexes, as described abo.e. Af.e, 
„,sh,n, .he secondary immune complexes are con.ac.ed wi.h a .bird brnding ligand or an.ibody .ha. has 
binding a,.,n,.y for .he second an.ibody, again under condirions effec.,.e and for a period of .™e 
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sufficient to allow the formation of immune complexes (tertiary immune complexesi. The third ligand or 
antibody is linked to a detectable label, allowing detection of the tertiary immune complexes thus formed. 
This system may provide for signal amplification if desired, 
2. Immunohistochemistry 
5 The antibodies of the present mvention may also be used in con,unct.on with both fresh fro.en 

and formalin-fixed, paraffin-embedded t.ssue blocks prepared for study by immunohistochemistry (IHC). 
For example, each tissue block consists of 50 mg of residual "pulverized" diabetic tissue. The method of 
preparing tissue blocks from these particulate specimens has been successfully used in previous IHC 
studies of various prognostic factors, and ,s well known to those of skill in the art (Brown et al., 1990; 
10 Abbondanzoefa/., 1990; Allreder<?/., 1990). 

Briefly, frozen-sections may be prepared by rehydrating 50 ng of frozen "pulverized" diabetic 
tissue at room temperature ,n phosphate buffered saline (PBS) in small plastic capsules; pelleting the 
particles by centrifugation; resuspending them in a viscous embeoding medium (OCT); inverting the 
capsule and pelleting again by centrifugation; snap-freezing in -70'C isopentane; cutting the plastic 
capsule and removing the frozen cylinder of tissue; securing the tissue cylinder on a cryostat microtome 
chuck; and cutting 25 50 serial sections. 

Permanent-sections may be prepared by a similar method involving rehydration of the 50 mg 
sample in a pJastic microfuge tube; pelleting; resuspending in 10% formalin for 4 hours fixation; 
washing/pelleting; resuspending m warm 2.5% agar; pelleting; cooling in ice water to harden the agar; 
removing the tissue/agar block from the tube; infiltrating and embedding the block in paraffin; and cutting 
up to 50 serial permanent sections. 
3. EUSA 

As noted, it is contemplated that the encoded proteins or peptides of the invention will find utility 
as immunogens, e.g., in connection with vaccine development, in immunohistochemistry and in ELISA 
assays. One evident utility of the encoded antigens and correspondmg antibodies is in immunoassays for 
the detection of HNFla. HNFip and HI^F4a. mutant protiens. as needed in diagnosis and prognostic 
monitoring of MODY. 

Immunoassays, in the.r most simple and direct sense, are binding assays. Certain preferred 
immunoassays are the various types of enzyme linked immunosorbent assays (ELISA) and 
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radioimmunoassay. (RIA, known in ,he art. I.monohis,oche™cal dstecion .sins 1-oe sections ,s also 
particularly usa.ul. Howe.er, i, will be readily apptaora.ed .ha, detecon is no. lim,,ed ,o such 
techniques, and western blottin,, do. blo..in,, f ACS analyses, and ,he l*e nay also be used, 

in one exemplary EUSA, an.ibodies binding ,. .he encoded pro.erns of the mven.,on are 
immobilized onto a selected surface exhibi.ing protein aHinr.y, such as a well in a polystyrene mrcro.rter 
plate Then, a tes, cemposition suspected of containir, the HNFla, HNFip or HNF4a mutant, such as a 
clinical sample. Is added to the wells. After binding and washing .0 remove non-specifically bound 
Immune complexes, .he bound an.ibody may be detected. Detection is generally achie.ed by the add,„on 
„, a second antibody specific for the targe, pr..ein, tha. is linked ,o a detectable label. This type of 
EUSA is a Simple "sandwich EUSA". De.ection may also be achieved by ,he addition of a second 
an.ibody, followed by .he addition o, a third an.ibody .hat has brndin, affini.y for the second an„body, 
with the third antibody being linked to a detecable label. 

in another exemplar, ELISA, the samples suspected of containing .he mu.an. HNFla, HNFl p or 
H,*F4a antigen are immoblBzed onto .he well surface and then contacted with the antibodies of the 
invention. After binding and washing to remove non-specifically bound ™une complexes, the bound 
antigen is detected. Where .he initial .n,ihod,es are linked to a de.ec.able label, .he immune complexes 
may be de,ec.ed direcly. Again, .he immune complexes ma, be d..ected using a second antrbody .ha. 
has binding aifini., fo, .he firs. an.*ody, with the second antibody being linked to a detectable label. 

Another ELISA in which the proteins or peptides are immobilized, involves .he use of an.rbody 
compe.i.ion in the de.ec.i.n. In .his EUSA, labeled an.ibodies are added .o .he wells, allowed .. bind .0 
.he mutan. HNFla pro.ein, mu.ant HNFlp protein or mutant HNF4a protein, and detected by means of 
tteir label The amount of marker anligen in an unknown sample is .hen determ^d by mixing the sample 
with the labeled antibodies before or durmg incubation wdh coa.ed wete. The presence of marker 
an,i,en in ,be sample aCs to reduce .he amount of antibody ayailable for binding ,. .he «ell and .bus 
reduces .he ultimate signal. This is appropriate for detecting antibodies in an unknown sample, where the 
unlabeled antibod^s bind .0 ,he an,i8en.coa.ed wells and also reduces the amount of antigen available to 
bind the labeled antibodies. 
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lrr8spect,.e of the fom,a, employed, ELISAs ha.e cenain fealo-es ,n common, soch a= coa„ng 
.ne.ba.,ng or b.ndmg, washing ,o remove non-specficallv bour,<) speces, ar,d de,ect,ng ,he bomd ,mmur,e 
complexes. These are described as follows: 

In coating a plate with either antigen or ant.body, one will generally incubate tbe wells of the 
5 plate wrth a solution of the antigen or antibody, either oyernrght or for a specified period of hours The 
«ells of the plate will then be washed to remo.e rncompletely adsorbed matenal Any remarning aya.lable 
surfaces of the wells are then "coated- with a nonspecific protein tha, ,s antigenically neutral „„h 
regard to the test ant.sera. These include bo.ine serum albumin IBSAI, casern and solutions of milk 
powde, The coating of nonspecific adsorption sites on the tmmobiliaing surface reduces the background 
1 0 caused by nonspecific binding of antisera to the surface. 

In EllSAs, it is probably more customary to use a secondary or tertrary detection means rather 
than a dtrect procedure. Thus, after binding of a protein or antibody to the well, coating with , non- 
reactrye material to reduce background, and washing to rer^o.e unbound material, the irr^obilizmg 
surface ,s contacted with the control M00Y3, M0DY4 or MODYI andlor clinical or biological sample to 
5 be tested under conditions effecri.e to allow immune complex lantigeniantibody) formation Detection of 
the tmmune complex then requires a labeled secondary binding ligand or antibody, or e secondary binding 
hgand or antibody in conjunction with a labeled tertiary antibody or third binding ligand, 

"Under conditions effective to allow immune complex (ant.gen/antibodyl formation- means that 
the conditions preferably include diluting the anttgens and antibodies with solutions such as BSA bene 
5 Samma globulin IBGGl and phosphate buffered saline IPBSI/Tween'", These added agents also tend to 
assist in the reduction of nonspecific background. 

The "suitable" conditions also mean that the incubation is at a temperature and for a period of 
time sufficient to allow effective binding. Incubation steps are typically from about 1 to 2 to 4 hours at 
temperatures preferably on the order of 25° to irz. or may be overnight at about 4°C or so. 

Following all incubation steps ,n an ELISA. the contacted surface is washed so as to remove non- 
complexed material. A preferred washing procedure includes washing with a solution such as 
PBS/Tween-. or borate buffer. Following the formation of specific immune complexes between the test 
sample and the origirially bound material, and subsequent washing, the occurrence of even minute 
amounts of immune complexes may be determined. 
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,e,ec,io„ P..«=b„. >abe, en... .h. will ,en«,.e color develop™, .poo «o.„,o, 

,0 app.p.,e S0.U3. ,o, e.a.p,e, one will ..e ,o con.ec. . « 

I ... 0 second ,n,»ne co.ple< w„. a o,ea.e, .oc.se ..da.. a«™ pHosp.a.ese o, M 
pe™*..con|o..ed an.od, ,e. a pe.lod o, ...e an. onde, condl.lons .a, lavo, 
,„..e, H..o„e con,ple> ..nation .... -..,on 2 ..o. a. .e.pe.a...e ,n a PBS-con.a,n,n, 
solution such as PBS-Tween^*'). 

...31, *e a.oon. o. Ia.el . ooan«ed, ... . ioco.a,lon wl.H a 

,„e ...oceso, p.ple or ...a.do*,3.e,H„.en:,l,ia.*e-6.o«on,c acd ,»BTS, an H 0 . n ^ 
case 0, pe,..idase ac .l-e .n.vn>e label. Q-a.l.n is ,.n acNevad b, .easonn, *e de.ee .i colo, 
Soneration, e.g.. using a .isiUte specua speclroptalomater. 
4 Use of Antibodies for Badioimagiaf 

THe an,i..dies o, this invention will be osed to ,oan.i., and localise .he exp.essi™ o 
,„„,ed .a.e- .o,e,ns. The an.ihadv. .o, o.an.le, ^11 he laheled hv anv one o, a .anetvol 
3„, osed ,0 .,soal.e .he localised co„cen„a„.n .he cells p-odocin, ,he encoded p.. e.. ^ch 
l; oiso vil, -e-ea, .ho sohcellolat location o, .he p™.ein, ».ch can have d,a.os.,c and 

,nv 0. seve-al .ochni,ues known .ho a«. The n«.hods ,ha presen, ..en^o., ^a, also 
.Lgnetic iso.opes ... po,peses . .o deteCion. Ele™on.s panicola-lv oselol ,n Ma,ne„c 
Resonance Imasinsl-MRn include '"Gd.==Mn,"^Dv,"C,. and Fa. 

\J.L. .he laheled an.,h.dv .a, he local o, svs.e.ic and accpLshed in.— v. 
|„U,at.enal.,, .ia .he spiral fluid o, .he like. Ad.inis..a.ion nay also he in.,ade,n,al o. ,n..acav an, 
, I n n, 0 .n hodv si.e unde, e— n. A„e, a s*,en. .i™ ha. lapsed lo, .he — 
Idv o. U. .he.eo. .0 hind ..h .he .eased .issue, ,o, e..p. 30 - ^ - 

puna, scintillation imaging or na«lv eme-grng ™a„ng tecMuas. P 

■< .» .h. n»iif,nt as noted above, and depending upon .he body sue 
,ary depending upon fac.ots specific .0 the patient, as noteo 
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under exammat.on. method of admmistrat.on and type of labe! used, the detarm,nat,on of specf.c 
procedures .ould be rout.ne to the skilled art.san. The distribution of the bound rad.oact.ve isotope and 
.ts increase or decrease with t.me is then monitored and recorded, By comparing the results with data 
obtained from studies of clinically normal individuals, the presence and extent of the diseased tissue can 
be determined. 

It will be apparent to those of skill m the art that a similar approach may be used to radio-image 
the production of the encoded HNFla, HNFl p or HNF4a mutant proteins in human patients. The present 
invention provides methods for the in vivo diagnosis of M0DY3, M0DY4 or M0DY1 in a patient. Such 
methods generally comprise administering to a patient an effective amount of an HNFla. HNFip or 
HNF4a mutant specific antibody, to which antibody is conjugated a marker, such as a radioactive isotope 
or a spin-labeled molecule, that is detectable by non-invasive methods. The antibodymarker conjugate is 
allowed sufficient time to come into contact with reactive antigens that are present within the tissues of 
the patient, and the patient is then exposed to a detection device to identify the detectable marker. 
5. Kits 

In still further embodiments, the present invention concerns immunodetection kits for use with 
the immunodetection methods described above. As the encoded proteins or peptides may be employed to 
detect antibodies and the corresponding antibodies may be employed to detect encoded proteins or 
peptides, either or both of siich components may be provided ,n the kit. The immunodetection kits will 
thus comprise, in suitable container means, an encoded protein or peptide, or a first antibody that binds to 
20 an encoded protein or peptide, and an immunodetection reagent. 

In certain embodiments, the encoded protein or peptide, or the first antibody that binds to the 
encoded protein or peptide, may be bound to a solid support, such as a column matrix or well of a 
microtiter plate. 

The immunodetection reagents of the kit may take any one of a variety of forms, including those 
25 detectable labels that a,e associated »l,h o, linked to the given antibody o, anttgen, and detectable labels 
that are associated with or attached to a secondary binding ligand. Exemplary secondary ligands are 
those secondary antibodies that have binding affinity ,„r the first antibody or antigen, and secondary 
antibodies that have binding affinity (or a human antibody. 
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F„r,he- su,t3bte ImmunodeteCion rsagea.s for use in the present kils melude the two-cor,ponent 
^easent .ha, co,.pnses a secndacy antibodv that has b,nd,ng a,l,n,tv I., .he first antibody or a„„ge„. 
a,„„, with a .bird antibody that has bindin, at.inity ..r the second aotibody, the thrrd antibody be., 

linked to a detectable label. 

The kits may further compnse a suitably afiquoted composition of the encoded prote.n o, 
polypeptide antrgan, whether labeled or unlabeled, as may be used to prepare a standard cur,e for a 

detection assay. 

The kits ma, contain antibodylabel conjusates either in fully con|ugated form, in .he form of 
,„,e,mediates, or as separate moieties to be c».,ugated hy the user of the k„. The components of the 
kits may be packaged either in agueous media or in lyophifeed form. 

The container means of the kits will generally include at least one vial, test tube, flask, bottle, 
syringe or other container means, int. which the antibody or antigen may be placed, and preferably, 
suitably aliguoted. Whore a second or th.rd binding ligand or additional componem is provided, the k,, w,ll 
also gooerally contain a second, third or other additional container into which this ligand or component 
may be placed. The kits of the present invention will also typically include a means fo, conta,n,r,g ,he 
amibody, antigen, and any other reagent con.a«,ers . close confinement for commercial sale. Such 
contarners may include iniect-on o, blow-molded plastic containers into which the desired „als are 
retained. 

K Detection and Quantitation of Nucleic Acid Species *umci«hmfir 
One embodiment .f the instant invention comprises a method to, ,den.,f,c,t,on of HNFl a, HNFl p 

„, mna mutants in a biological sample by amptfyin, and detecting nucleic acids corresponding to 
HNFla HNFIP or HNF4a mu.ants. The biological sample tan bo any tissue or fluid in whrch these 
mutants might be present. Various embodiments include p and a-cells of pancreafc islets, bone marrow 
aspirate, bone marrow biopsy, lymph node aspirate, lymph node biopsy, spleen tissue, f.ne needle 
aspirate, skin biopsy or organ tissue biopsy. Other embodiments include samples where the body fluid ,s 
peripheral blood, lymph fluid, ascites, serous fluid, pleural effusion, sputum, cerebrospinal fluid, lacrimal 
fluid, stool or urine. 

Nucleic acid used as a template for amplification is isolated from cells comained in ,he biological 
sample, according to standard me.hodologies ISambrook 19891. The nucleic acid may be gerromrc 
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DMA or fracnonated or whole cell RNA. Where RNA ,s used, .t may be das.red ,o convert the RNA to a 
complemer^tarv DMA. In one embod.ment. the RNA ,s whole cell RNA and ,s used d.rectly as the template 
for amplification. 

Pairs of primers that selectively hybrid.ze to nucleic acids corresponding to HNFIa. HNFip or 
HNF4a mutants are contacted with the isolated nucleic acd under conditions that permit selective 
hybndization. Once hybridized, the nucleic acid:primer complex is contacted with one or more enzymes 
that facilitate template-dependent nucleic acid synthesis. Multiple rounds of amplification, also referred 
to as "cycles." are conducted until a sufficient amount of amplification product .s produced. 

Next, the amplification product is detected. In certain applications, the detection may be 
performed by visual means. Alternatively, the detection may involve indirect identification of the product 
via chemiluminescence. radioactive scintigraphy of incorporated radiolabel or fluorescent label or even via 
a system using electrical or thermal impulse signals (Affymax technology; Bellus, 1994). 

Following detection, one may compare the results seen in a given patient with a statistically 
significant reference group of normal patients and MODY or indeed MODY dependent diabetics and non 
MODY dependent diabetics. In this way, it ,s possible to correlate the amount of HNFIa. HNFip or 
HNF4a mutants detected with various clinical states. 
/■ Primers 

The term primer, as defined herein, is meant to encompass any nucleic acid that is capable of 
pnming the synthesis of a m.cenX nucleic acid in a template-dependent process. Typically, primers are 
oligonucleotides from ten to twenty base pairs in length, but longer sequences can be employed. Primers 
may be provided in double-stranded or single-stranded form, although the smgle-stranded form ,s 
preferred. 

^. Temp/ate Dependent Amplification Methods 

A number of template dependent processes are available to amplify the marker sequences present 
m a given template sample. One of the best known amplification methods is the polymerase chain 
reaction (referred to as PGR) which is described in detail in U.S. Patent Nos. 4.683.195. 4,683,202 and 
4.800.159, and in Innis et al., 1990, each of which is incorporated herein by reference in its entirety. 

Briefly, m PGR. two primer sequences are prepared that are complementary to regions on 
opposite complementary strands of the marker sequence. An excess of deoxynucleoside triphosphates 
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a,e added .o a reaCon ™x..a alon, w,lh a ONA Mv-sse, T., polymerase If .he .arke, 
.e,»e„ce ,s preset. ,p a sample, ,he p™e. wil. b,nd ,he .a-.er apd ,he po,v™,as= wi., .ause . e 
p„.e,. ,0 .e e..anded along .he .a,Ue, se,.a„ee bv addin, ou auCeo.ide. B, .ai.i™, and .we,., -he 

,eac.«d producs. excess primers .ill hind .0 .he marka, and .o .he reao.ion prodac.s aad ,he praeess ,s 

reverse .ranscrip.ase PCR ampli.ica.i=n procedure may be pe,.om»d . order .o ,uan.i.y .he 
amopn, o, mRNA amplified. Me.hods o, re-erse .ranscrWn, RNA in.o cDNA are well known and 
described ,n Sambrook e, ./., .989. Al.e,na,i.e me.hods .or reverse .,anscrip.ion *e .hermos.ab e, 
R.A.dependen. ONA polymerases. These me.hods ara described in WO 90,0764, .iled December 2,, 
1990. Polymerase chain reacion me.hodolosias are well known in .he art. 

Anchor me.K0d .or ampli.ication is .be hgase chain reaCion ,"LCR"1. disclosed . EPA No. 320 
308 ,nco,pora,ed barein by reference in i.s en.i,e.y. In ICR, .wo complen,an.a,v probe p»rs are 
prepared, and in ,ha presence o. .he .a„e, seguence, each pair will bind .0 opposi.. cmplemen.ary 

,„,m a single on,.. By .empera.ure cycling, as in PCR, bound liga.ad eni,s drssocia.e from .ha .arge, and 

melhod similar .oLCR for binding ptobe pairs 10 a. arge. sequence. 

Qbe.a Replicase, described in PCT Application No. PCTIUS87ro0880. may also be used as s„ll 
anolher ampli.ica.ion n,e.hod in ,he presen. in.en.i™,. .n .bis me.bod, a rep,ica,ive seguance .. RNA .ba, 
,as a regron complementary .o .ba. o. a ,arge. is added .o a sample in .he presence ». an 
polymerase. The polymerase wiU cop, .he replica.ive saguence .ha. can .ben ba de.ec.ed. 

An ,so,bermal ampli.ica.«n me.bod. in which -estriCion endonuclaases and ligases are used to 
achieve .he amplifrcafion o. .arge. molecules .ha, contain nucleo.ide B'.lalpha .biol-.ripbospba.es m one 
5 s.rand o. a res.nc.ion si.e may also be useful m .be *ion o. nucleic acids in .he presan. in.en.,.n. 
Walker « si (19921. incorporated herein by re.erence in its entirety. 

Strand Displacement AmpB.ica.ion (SOA) is anp.ber me.b.d o. carrying out isothenmal 
amplificatron of nucleic acids which involves multiple rounds ., strand displacement and synthesis, ,.e.. 
nick translatron. A similar method, called Repa,, Chain Reaction IRCRl. involves annealing several probes 



BNSDOCID <WO 9eil2S4Al_L> 



10 



15 



20 



25 



wo 98/1 1254 

PCT,X:S97/16037 

74 

throughout a region targeted for amplification, followed by a repair reaction m which only two of the four 
bases are present The other two bases can be added as biotmylated derivatives for easy detection. A 
similar approach is used m SDA. Target specific sequences can also be detected using a cyclic probe 
reaction (CPR). In CPR. a probe having 3" and 5' sequences of non-specific DMA and a middle sequence of 
specific RNA ,s hybridized to DNA that .s present in a sample. Upon hybridization, the reaction is treated 
with RNase H, and the products of the probe identified as distinctive products that are released after 
digestion. The or.gmal template is annealed to another cycling probe and the reaction ,s repeated. 

Still another amplification methods described in GB Application No. 2 202 328, and in PCT 
Application No. PCT/US89/01025, each of which ,s incorporated herein by reference in its entirety, may 
be used in accordance with the present invention. In the former application, "modified" primers are used 
m a PCR Iike, template- and enzyme-dependent synthesis. The primers may be modified by labelling with 
a capture moiety (e.g.. biotin) and/or a detector moiety (e.g., enzyme). In the latter application, an excess 
of labeled probes are added to a sample. In the presence of the target sequence, the probe binds and is 
cleaved catalyt.cally. After cleavage, the target sequence is released intact to be bound by excess probe. 
Cleavage of the labeled probe signals the presence of the target sequence. 

Other nucleic acid amplification procedures mclude transcription-based amplification systems 
(TASl, including nucleic acid sequence based amplification (NASBA) and 3SR (Kwoh et at., 1989); 
Gingeras et ai. PCT Application WO 88/10315. incorporated herein by reference in their entirety). In 
NASBA, the nucleic acids can be prepared for amplification by standard phenol/chloroform extraction, 
heat denaturation of a clinical sample, treatment with lysis buffer and min.spin columns for isolation of 
DNA and RNA or guanidinium chloride extraction of RNA. These amplification techniques involve 
annealing a primer which has target specific sequences. Following polymerization. DNA/RNA hybrids are 
digested with RNase H while double stranded DNA molecules are heat denatured again. In either case the 
single stranded DNA is made fully double stranded by addition of second target specific primer, followed 
by polymerization The double-stranded DNA molecules are then multiply transcribed by an RNA 
polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNA's are reverse transcribed mto 
single stranded DNA, which is then converted to double stranded DNA, and then transcribed once again 
with an RNA polymerase such as T7 or SP6. The resulting products, whether truncated or complete, 
indicate target specific sequences. 
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Oavey et al.. EPA No. 329 822 (incorporated herein by reference in its ent.rety) disclose a nucle.c 
acd ampl.f.cation process involving cyclically synthesizing single stranded RNA ("ssRNA"). ssDNA. and 
double-stranded DNA IdsDNA), which may be used in accordance with the present invent.on. The ssRNA 
,s a ter^plate for a first pnmer oligonucleotide, which is elongated by reverse transcriptase (RNA- 
5 dependent DNA polymerase). The RNA is then removed from the resulting DNA:RNA duplex by the action 
of ribonuclease H (RNase H. an RNase specific for RNA in duplex with either DNA or RNA), The resultant 
ssDNA is a template for a second primer, which also includes the sequences of an RNA polymerase 
promoter (exemplified by T7 RNA polymerase) 5' to its homology to the template. This primer is then 
extended by DNA polymerase (exemplified by the large "Klenow" fragment of f. coli DNA polymerase II. 
10 resulting m a double-stranded DNA ("dsDNA") molecule, having a sequence identical to that of the original 
RNA between the primers and having additionally, at one end. a promoter sequence. This promoter 
sequence can be used by the appropriate RNA polymerase to make many RNA copies of the DNA. These 
copies can then re-enter the cycle leading to very swift amplification. With proper choice of enzymes, 
this amplification can be done isothermally without addition of enzymes at each cycle. Because of the 
1 5 cyclical nature of this process, the starting sequence can be chosen to be in the form of either DNA or 
RNA. 

Miller et ah, PCT Application WO 89/06700 (incorporated herein by reference in its entirety) 
disclose a nucleic acid sequence amplification scheme based on the hybridization of a promoter/primer 
sequence to a target single-stranded DNA C'ssDNA-) followed by transcription of many RNA copies of the 
20 sequence. This scheme Is not cyclic. /.... new templates are not produced from the resultant RNA 
transcripts. Other amplification methods include "RACE" and "one-sided PCR" (Frohman. M.A.. In: PCR 
PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS. Academic Press, N.Y.. 1990; Ohara et al.. 
1989; each herein mcorporated by reference in their entirety). 

Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid having 
25 the sequence of the resulting "di-oligonucleotide", thereby amplifying the di-oligonucleotide. may also be 
used in the amplification step of the present invent.on. Wu et ai. 1989), incorporated herein by reference 
in its entirety. 
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3. ft/Vase Protect/on Assay 

Methods for genet.c screening by .dent.fy.ng mutat.ons associated mth n^ost genetic diseases 
such as diabetes must be able to assess large regions of the genome. Once a relevant mutation has been 
.dentified ,n a g.ven patient, other family members and affected individuals can be screened using 
methods ^hich are targeted to that site. The ability to detect dispersed pomt mutations is critical for 
genetic counseling, diagnosis, and early clinical intervention as well as for research into the etiology of 
cancer and other genetic disorders. The ideal method for genetic screening would quickly, inexpensively 
and accurately detect all types of widely dispersed mutations m genomic DNA, cDNA, and RNA samples, 
depending on the specific situation. 

H,s,o,icallv, a numb« of different meti,ods ha.e been used to detect point mutat.ons, including 
denatunns gradient gel electrophores.s rDGGE-|. restriction enrynte polymorphtsm analysis, cltem,cal and 
enzyma„cclea,aBe methods, and others ICotton, 19891. The more common procedures currently in use 
.nclude direct seguencing of target regions amplified by PCR'» and single strand conformation 
polymorphism analysis ("SSCP"): 

Another method of screening for point mutat.ons is based on RWase cleavage of base pair 
mismatches in RNA/DNA and RNA/RMA heteroduplexes. As used herein, the term "mismatch" is defined 
as a region of one or more unpaired or mispaired nucleotides in a double-stranded RI\1A/RI\IA, RNA/DNA or 
DNA/DNA molecule. This definition thus includes mismatches due to insertion/deletion mutations, as well 
as single and multiple base point mutations. U.S. Patent No. 4.946,773 describes an RNase A mismatch 
cleavage assay that involves annealmg single-stranded DNA or RNA test samples to an RNA probe, and 
subsequent treatment of the nucleic acid duplexes with RNase A. After the RNase cleavage reaction, the 
RMase is inactivated by proteolytic digestion and organic extraction, and the cleavage products are 
denatured by heating and analyzed by electrophoresis on denaturing polyacrylam.de gels. For the 
detection of mismatches, the single-stranded products of the RNase A treatment, electropharetically 
separated according to size, are compared to similarly treated control duplexes. Samples containing 
smaller fragments (cleavage products) not seen in the control duplex are scored as 

Currently available RNase mismatch cleavage assays, including those performed according to 
U.S. Patent No. 4.946,773, require the use of radiolabeled RNA probes. Myers and Maniatis m U S 
Patent No. 4,946,773 describe the detection of base pair mismatches using RNase A Other 
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i„.e„s.,9a.a,s ha,e descnbed the use of eazyme, RNase 1, « m,smatch a«a,s. Because i. ha. 
broader cleavage spechcl, .han RNase A. RNase I -ould be a desirable enzyme 1. employ ,o .he 
detecior, ol base parr n„s™a.ches if components can be tound to decrease the extent of non-spocfrc 
cleavage and increase the fre,oency of cte,.a8e of n.sn,atches. The use of RNase I fo, n„smatch 
detection is descibed in literature from Promega Biotech. Promega markets a ki, containm, RMase 1 that 
is shown in their literature to cleave three out of four known mismatches, provided the enzyme level .s 
sufficiently high. 

The RNase protection assay as firs, described by Melton et el. (1984) was used .0 detect and 
map ,he ends of specific mRNA targets in solution. The assay relies on being able to easily generate h,gh 
specific activity radiolabeled RNA probes complementary .o the mRNA of interest by /» vnrc 
transcription. Ongrnally, the templates for n, mo transcription were recombinant plasmids ccmainmg 
bacteriophage pr»noters. The probes are mi«d with ...al cellular RNA samples .o permit hybndrzation 
,0 .heir complememary targets, then the mi.tu-e is treated with RNase to degrade excess unhybr,d«ed 
probe Also, as originally rn.ended, the RNase used is specific for single-s.randed RNA, so that hybndrzed 
double-stranded probe is protected from degradation. After inactivation and removal of the RNase, the 
protected probe (which is proportional in amount to the amount of target mRNA .ha. was present! ,s 
recovered and analyzed on a polyacrylamide gel. 

The RNase Protection assay was adapted for detection of single base muta.ions by Myers and 
Maniatis 119851 and by Winter and Perucho (1985). In this type of RNase A mismatch cleavage assay, 
radiolabeled RNA pr,*es transcribed ir, ,itro from wild type sequences, are hybridized to complementary 
target regions derived from test samples. The test ,a,ge, generally comprises DNA (either genomrc DNA 
or DNA amplified by cloning in plasn^ds o. by PCR™1, although RNA targets (endogenous mRNAl have 
occasionally been used (Gibbs and Caskey, 1987; Wut.er « ./.. 19861. If single nucleotide lo, greater) 
segueoce differences occur between the hybridized probe and target, the resulting disruption in Watson- 
Crick hydrogen bonding at that position Cmismatch") can be recognized and cleaved in some cases by 
single strand specific .ibonuclease. To date, RNase A has been used almost exclusively for cleavage of 
single-base misma.ches, aUheugh RNase 1 has recently been shown as useful also for mismatch cleavage. 
There are recem descriptrons of ustng the NlutS protein and other ONA-repair enzymes for detection of 
single-base mismatches (Ellis et sL 1994: lishanski etaL 1994). 
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By hybridizing each strand of the w.ld type probe in RNase cleavage mismatch assays separately 
to the corr^plementary Sense and Antisense strands of the test target, two different complementary 
mismatches (for example, A-C and G U or G-T) and therefore two chances for detecting each mutat.on by 
separate cleavage events, was provided. Myers et al. (1985) used the RNase A cleavage assay to screen 
615 bp regions of the human p-glob.n gene contained in recombinant plasmid targets. By probing with 
both strands, they were able to detect most, but not all, of the (i-giobin mutations in their model system 
The collection of mutants included examples of all the 1 2 possible types of mismatches between RNA and 
DNA: rA/dA, rC/dC. rU/dC. rC/dA. rC/dT, rU/dG, rG/dA, rG/dG, rU/dG, rA/dC, rG/dT, and rA/dG. 

Myers et. al. (1985) showed that certain types of mismatch were more frequently and more 
completely cleaved by RNase A than others. For example, the rC/dA, rC/dC, and rC/dT mismatches were 
cleaved in all cases, while the rG/dA misr^atch was only cleaved in 13«^ of the cases tested and the 
rG/dT mismatch was almost completely resistant to cleavage. In general, the complement of a difficult- 
to-detect mismatch was much easier to detect. For example, the refractory rG/dT mismatch generated by 
probmg a G to A mutant target with a wild type sense-strand probe, is complemented by the easily 
cleaved rC/dA mismatch generated by probing the mutant target with the w.ld type antisense strand. By 
probmg both target strands, Myers and Maniatis (1986) estimated that at least 50% of all single-base 
mutations would be detected by the RNase A cleavage assay. These authors stated that approximately 
one-third of all possible types of smgle-base substitutions would be detected by using a single probe for 
just one strand of the target DNA (Myers et al.. 1985). 

In the typical RNase cleavage assays, the separating gels are run under denatunng conditions for 
analysis of the cleavage products. This requires the RNase to be inactivated by treating the reaction 
with protease (usually Proteinase K. often in the presence of SDS) to degrade the RNase. This reaction is 
generally followed by an organic extraction with a phenol/chlorofom, solution to remove proteins and 
residual RNase activity. The organic extraction is then followed by concentration and recovery of the 
cleavage products by alcohol precipitation (Myers et al., 1985; Winter et al.. 1985, Theophilus et al 
1989). 

4. Separation Methods 

Following amplification, i, may he desirable to separate the amplif.cat.on prodpct from the 
template and the excess primer for the purpose of deternrinin, whether specif,c a.p„„ca,i„n has 
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thin-layer and gas chromalogiaphy (Fteifelict, 19821. 
5 Uentificatimi MetMs 

. , „ h. .isuatad in o-de- to confirm amplification of the n-arke- 
Amplification products must be ..suataed 

..ouences One typical visualization method involves statmnp of a ,el w.th ethtdmm 
ZZ. unde, : li.,, .1— y, if .e amplification products »e integrally .e 
fluorometricallylaheled noCeatides, .he an,pli.,cat,o„ products can then he exposed to «-rav 

probe is coniu,a.ed ,0 a binding partner, such as an antrbod, or broun. 

'-c— It:...-.----"- 7"-:: 



BNSDOCID: <WO 9eil254A1_L> 



PCT/XS97/16037 

80 



10 



15 



20 



6. Kit Components 

All .ha a.seM,3l ma.enal. and ,eage„,. requ.ed fa, ds.acn, MODY markers ,r, a br.log.cal 
sample may be assen,bled together ,„ a kit. Th,s generally »„, comprise pre selected prrmers lor specifrc 
markers. Also rneluded may be enzy„«s surtable for amplifymg „„cle,c acids rncludrng ,ar,0ds 
polymerases ,RT, Tag, etc.,, deo„„„cleo,ides and buffers to provide tbe necessary reaction m,xt„re ,or 

amplification. 

Such kits generally comprise, in suitable means, distinct contarners for each i„d,„dual 
-eager,, and enzyme as well as fo, each marker primer pair. Preferred pairs o, primers for ampN.yrn, 
nrrclerc acrds are selected to amplify the ssgoences specified ,n SEQ 10 N0:3, SEQ 10 NO-5 o, SEQ ID 
N0:5, along with the cDNAs fo, HNFIa fSEQ ID NO:,, HNRp ,SED ,0 N0:,28l and HNF4a tSEQ ,0 
NO:78,. ,„ other embodiments preferred pairs o, primers for amplification are selected to amplify 
«,.ences specified ,n SEO ID N0:34, SEQ ID ND:36, SEQ ,0 N0:38, SEQ 10 N0:40, SEQ ID NO-42 SEQ 10 
N0:44, SEQ 10 N0:46, SEQ ID N0:4e, SEQ ID N0:50. SEQ ID M0:52, SEO ID N0:54. 

In another embodiment, such kits will compnse hybridization probes specific for M0DY3 chosen 
from a group including nucleic acids corresponding to the sequences spec,f,ed ,n SEQ ID NO , SEO 10 
N0:3, SEQ ID ND:5, and SEQ ID N0:7, along with the cONAs fo, HNFta ISEQ 10 NO:U In ye, another 
embodrment such kits will comprise probes specific for mm 1 chosen from a group including nucleic 
acrds corresponding to the sequences specified in SEQ ID N0:78, SEO ,0 N0:34, SEQ 10 N0-3B SEQ ID 
N0:38, SEQ ,0 NO..4O, SEQ 10 N0:42, SEQ ID N0:44. SEO ID N0:46, SEQ 10 N0:48, SEQ ID NO SO SEO 10 
N0:52, SEQ ,0 N0:54, HNF4a. In stil. another embodiment such kits will comprise probes speCic fo, 
M00Y4 Chosen from a group including nucleic acids corresponding .0 the sequences specified in SEQ 10 

N0:128, HNFip or any of the exons shown in FIG 27A Fin 971 n, Poni. u 

aiiuwii m rib. z/A-Hb. 2 71, or Genbank accession numbers U90279- 

90287 and U96079, incorporated herein by reference. 

Such kits generally will comprise, in suitable means, distinct containers for each individual 
reagent and enzyme as well as for each marker hybridization probe. 
L. Use of RNA Fingerprinting to Identify M0DY3. M0DY4. and M0DY1 Markers 

RNA fingerprinting is a means by which RNAs isolated from many different tissues, cell types or 
treatment groups can be sampled simultaneously to identify RNAs whose relat.e abundances vary Two 
forms of this technology were developed simultaneously and reported ,n 1992 as RNA fingerprinting by 
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H P,rHPP 1992- Welsh er./., 1992). (Seeslsolm^ and Pardee. U.S. patent 
" ^* 1 • ts entlretv ) So.e of tHe e.en.ents described here. 

v.ereperformedsim.larlvtoDonahue.r./.J.^/a/.^/'^'"- .{fer in their primer design 

. k,, prR afp iheoietica ¥ similar but oittei m ww" V"" 
All loims ol RNA finserpratmg b» PCR aie tneoreiic 

. . „ HiH»re»« between diHe.e"tial feplay end "ift-'Os >>* BN« 
and application. The most s..*,n9 di«e.ence betwee ^^^^^ 

„„^,p,i„t,n, is tba, ...etentia, displav .Hi- ancbot,n, pnmets t a b J ^ 

.RNAs. AS a eonse,ue„ce, .be PCR products a,np«.,ed in di«e,e„t,al dtsplay ate b.ased 

untranslated rejioits ol mRNAs. ^ggj. 
The basic technique ot differential displa, has been descnbed «^ detad Iban, ■ 

■•-■-T:r:":;— ^^^^ 

PCR techniques, utilizin, the sa« pr.n«rs. The result., D 

•™-r=r:rrrr.-..-..— ^^^^^ 

° - "TTr ^^^^ - — - 

a/ 1994- Watson era/., 1934- unen wa/.. - 
;.A.in,erp,in.ln,.echni.e,oide„tifv.nes.betered,ffe,ent,a,,v.^^^^^^^ 

Reverse transcription IK U OT nwH lu uu _ Monv? 

, J . .eternttne the tela.ive cencentrations of speCic mRNA species tsolatad fron, MODYS, 
I dll pationts By dolerntinin, that the concen.ratton of a spacHtc .BNA specas ,a„es, 
- : batTrio. , - specific ntBNA spaces ,s ddferentialW .pressed. Th,s ,ecbn,.e 

t.a„sc.ts sho. .» be different* .epu,a.ed h. R. — 
are dHfereatially expressed in MODY related diabetes. 
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In PGR, the number of molecules of the amplified target DI\JA increase by a factor approaching 
two with every cycle of the reaction until some reagent becomes limiting. Thereafter, the rate of 
amplification becomes mcreasmgly diminished until there is no increase in the amplified target between 
cycles. If a graph ,s plotted in which the cycle number is on the X axis and the log of the concentration of 
the amplified target DNA is on the Y axis, a curved Ime of characteristic shape is formed by connecting 
the plotted points. Beginning with the first cycle, the slope of the line is positive and constant. This is 
said to be the linear portion of the curve. After a reagent becomes limiting, the slope of the line begins to 
decrease and eventually becomes zero. At this point the concentration of the amplified target DNA 
becomes asymptotic to some fixed value. This is said to be the plateau portion of the curve. 

The concentration of the target DNA in the linear portion of the PGR amplification is directly 
proportional to the starting concentration of the target before the reaction began. By determining the 
concentration of the amplified products of the target DNA in PGR reactions that have completed the same 
number of cycles and are m their linear ranges, it is possible to determine the relative concentrations of 
the specific target sequence in the original DNA mixture. If the DNA mixtures are cDNAs synthesized 
from RNAs isolated from different tissues or cells, the relative abundances of the specific mRNA from 
which the target sequence was derived can be determined for the respective tissues or cells. This direct 
proportionality between the concentration of the PGR products and the relative mRNA abundances ,s only 
true in the linear range of the PGR reaction. 

The final concentration of the target DNA in the plateau portion of the curve is determined by the 
availability of reagents in the reaction mix and is independent of the original concentration of target DNA. 
Therefore, the first condition that must be met before the relative abundances of a mRNA species can be 
determined by RT-PCR for a collection of RNA populations is that the concentrations of the amplified PGR 
products must be sampled when the PGR reactions are in the linear portion of their curves. 

The second condition that must be met for an RT-PCR experiment to successfully determine the 
relative abundances of a particular mRNA species is that relative concentrations of the amplifiable cDNAs 
must be normalized to some independent standard. The goal of an RT PCR experiment is to determine the 
abundance of a particular mRNA species relative to the average abundance of all mRNA spec.es in the 
sample. In the experiments described below, mRNAs for p-actin, asparagme synthetase and lipocortin II 
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.3,,.. ... .,a,.,es =.e e.,s.ve « .e p.«. «. POB ~ 

.n. .e. pKaoes. H P™.c« a. sa.p,. w.. .e ~ a ap^^ 
p,„aau phaoe, than *a ,es. abundant p.ad.c, bK0.e. ,e,a„.alv ,ep,ssen.ad. Con.pan n 
Lances ™ada ,o, ™. «a.n, B.. .an,p,es, a. . case v..n e-a., n, 

) standard is much more abundant than the targei. 

rnrencpd., .a... a.av n.aa». .a.a a.nda.a. no. a....a a.„dance o, 

3n a..e™a, .anda.d p™..c. assa. sa.p,e PCB p.ad... . .He „nea. P.™^^^^^^^^ 
3n,p,„ioa.,on eve. TKe npn*e. n. PCB evele. ..a, a,e .p.« ... sa..,n. - ^^^^^^ 
/ H ,n, each .a,Be. cDNA l.agmen.. In addilion, .he .e.e,se ..anscnp.ase p.oduc.s of each 

.Llifiahle cDNAs This conside.a,ion is tape-tan. since ,he assa, measures abselu.e mBNA 
ndr Ahsl abundance can he used as a ™asu.e o. d,„e.en.ia, .ne e..ssi.n en,v .n 
~d sa.p,es. Whi. e^pinca, de.e.™na,inn a, .he hnea, ,an. .he a.pii.,ca.,en cu.e 
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assays can be super.or to those derived from the relative quantitative RT-PCR assay with an mternal 



Standard 



One reason for this advantage is that without the internal standard/competitor, all of the 
reagents can be cor,verted Into a single PCR product in the l.near range of the amplification curve, thus 
increasing the sensitivity of the assay. Another reason is that with only one PCR product, display of the 
product on an electrophoret.c gel or another display method becomes less complex, has less background 
and is easier to interpret. 

M. Methods for Activation of Gene Expression 

In one embodiment of the present invention, there are provided methods for the increased gene 
expression or activation in a cell. This is particularly useful where there ,s an aberration in the gene product 
or gene expression is not sufficient for normal function. This will allow for the alleviation of symptoms of 
M0DY3 type diabetesexpenencedas a result of mutation in HNFla. M0DY4 type diabetes experienced as a 
result of mutation in HNFl p and fWODY 1 type diabetes experiencedas a result of mutation in HNF4a. 

The general approach to increasing gene expression as mediated by HNFla, HNFip or HNF4a 
according to the present invention, will be to provide a cell with an HNFla, HNFip or HNF4a polypeptide, 
thereby permitting the transcription promotional activity of HNFla, HNFip or HNF4a to take effect. While 
it is conceivable that the protein may be delivered directly, a preferred embodiment involves providing a 
nucleic acid encodingan HNFla. HNFip or HNF4a polypeptide,/.., an HNFla. HNFip or HNF4a gene, to 
the cell. Following this provision, the HNFla HNFip or HNF4a polypeptide is synthesized by the host cell's 
transcriptional and translational machinery, as well as any that may be provided by the expression construct. 
Cis actmg regulatory elements necessary to support the expression of the HNFla HNFl p or HNF4a gene 
will be provided, in the form of an expression construct. It also is possible that, expression of the virally 
encoded HNFla. HNFip or HNF4a could be stimulated or enhanced, or the expressed polypeptide 
stabilized, thereby achieving the same or similar effect. 

In order to effect expression of constructs encoding HNFla, HNFip or HNF4a genes, the 
expression construct must be delivered into a ceil. One mechanism for delivery is via viral infection, where 
the expression construct is encapsidated in a viral particle whicf, will deliver either a replicating or non- 
replicating nucleic acid. In certain embodiments an HSV vector is used, although virtually any vector would 



suffice 



3NSDOCID- <WO 98n2&4A1_i_> 



PCT/US97/16037 

WO 98/11254 

85 

several .on-vUel .ne.hods .0, *e .-ansfs, .1 e,p,ess,on cons.mcs into cul.ured mammalian cells 
„se a,e contemplate, by the ptesen, ln.en,ion. These Include c„c« phosphate p,ec,p„at,on (Graham and 
Van De, Eh 1973; Chen and Okayama, 1987: Rippe « ./., 1990. DEAE-de-tran (Gopal, 1985,. 
elec.,opo,a.,on (T.-Kaspa e, si. ,986; Pofet e, ./., 1984,, di,ec, m,ctoin,ec«on (Hatland and We,n.,a.b^ 
1985, DNA.|oaded liposomes (Niclau and Sena, 1982; Fraley « a/., .979, and „o.ec,an,ne DNA 
complexes. cell sonication.Fechheimeter./,, 1987,, ,ene homhardmen. „sin, hl,h ,e,ocl,v mic,opto.c„tes 
,Y,n, «. a/., 1990,. and ,ecep..,.me*a.ed.rans.ect,on (Wu and Wo, ,987; Wu and Wu, 1988,. Some 0. 
,hese,echni,oesmayhesoccessful„adaptedt.,^K»»o,«««use,asdlscossedbelow. 

,„ another embodiment of the «»en.ion, the expression construct may simply consrs, of naked 
, recombinant DNA or p.asm.ds. Transfer 0. the construct may be performed by any the methods 
ment,oned above wh.ch physically 0, chemically permeabi.,ze the cell membrane. This Is partrcolarty 
applicable for transfer » but i, may be applied to .m. ose as well. Another embodiment of the 
invention fo, transferring a naked DNA expression construct into cells may involve partic. bombardment. 
This method depends on the ability .0 accelerate ONA coated microproiectiles to a hi,h velocity a.o«,n, 
5 tbem to prerce cell membranes and enter ce«s without killin, them (Klein « ./., 1987,, Several devrces ,.r 
acce,e,atin, small particles have been developed. One such device relies on a high voltage discharge ,0 
generate an etecical current, which In turn provides the motive force fVan, « ,990,. Ths 
microproiectilesused have consistedof biologically Inert s.*stancessuchas tungsten 0, gold beads. 

In a further embodiment of the Invention, the expression construct may be entrapped in a hposome. 
20 Liposomes are vesicular structures charactenzed by a phospholipid hllaye, membrane and an inner atueous 
medium Multilamellar liposomes have multiple lipid layers separated by agueous medium. They font, 
spontaneously when phospholipids ,r. suspended . an excess of aqueous solution. The lipid components 
undergo sel. rearrangemen. before the formation of ctosed structutes and e„.,ap water and disso.ed 
solutes between the lipid bilayers (Ghesh and Bachhawat. 199,1. Also contemplated are ,lpofectam,ne.DNA 
25 complexes. 

Lrposome mediated nucleic acid delivery and expression of foreign DNA in vitro has been vary 
successful W„n9^r./.l,980,de™nstratedthe,easibilityofliposome.med,ateddeliver,andexpress,onof 

foreign DNA in cultured chick embryo, HeLa and hepatoma cells. In certa.n embodiments of the Invenfon. 
the liposome may be complexed with a hemagglut.nating virus (HVa,. This has been shown ,0 fachtate 
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fusion with the cell membrane and promote cell entry of l.posome-encapsalated DMA (Kaneda et a/.. 1989) 
In other embod.ments. the l.posor.e may be complexed or employed .n con.unct.on w.th nuclear ncn-h.stone 
chromosomal protems (HMG-l) (Kato et el.. 1991,. m yet further embodiments, the liposome may be 
complexed or employed m conjunct.on with both HVJ and HMG- 1 . In other embod.ments, the delivery vehicle 
may comprise a ligand and a liposome. Where a bacterial promoter is employed m the DMA construct. ,t also 
will he desirable to include within the liposome an appropriate bacterial polymerase. 

Other expression constructs which can be employed to deliver a nucleic acid encoding an HfJFla. 
HNF1 p, or HNF4a transgeneinto cells are receptor-mediated delivery vehicles. These take advantage of the 
selective uptake of macromolecules by receptor-mediated endocytosis in almost all eukaryotic cells 
Because of the cell type-specific distribution of various receptors, the delivery can be highly specific (Wu and 
Wu, 1993). 

Receptor-mediated gene targeting vehicles generally consist of two components: a cell receptor- 
specific ligand and a DNA-binding agent. Several ligands have been used for receptor-mediated gene 
transfer. The most extensively characterized ligands are asialoorosomucoid(ASOR) (Wu and Wu 1987, and 
transferrin (Wagner et el.. 1990). Recently, a synthetic neoglycoprotein. which recognizes the same 
receptor as ASOR, has been used as a gene delivery vehicle (Ferkol et a/.. 1993; Perales et al.. 1994). 
Mannose can be used to target the mannose receptor on liver cells. Also, antibodies to CDS (CLL), CD22 
(lymphoma). CD25 (T-celi leukemia, and MAA (melanoma) can similarly be used as targeting moieties. In 
other embodiments, the delivery vehicle may comprise a ligand and a liposome. 

Primary mammalian cell cultures may be prepared in various ways. In order for the cells to be kept 
viable while /„ mo and in contact with the expression construct, it is necessary to ensure that the cells 
maintain contact with the correct ratio of oxygen and carbon dioxide and nutrients but are protected from 
microbial contamination. Cell culture techniques are well documented and are disclosed herein by reference 
(Freshner, 1992). 

One embodiment of the foregoing involves the use of gene transfer to immortalize cells for the 
production of proteins. The gene for the protein of interest may be transferred as described above into 
appropriate host ceils followed by culture of cells under the appropriate conditior^s. The gene for virtually 
any polypeptide may be employed in this manner. The generation of recombinant expression vectors and 
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,he elmems .ncludeb are tecussed above, Alte,na„velv, the p...ein to be produced nnay be an 

endOBenous protein normally synthesijed by the i:ell in question. 

Examples o) useful mammalian host cell lines are Vero and Hela cells and cell l.nes of Chrnese 
hamster o.ary, W138. BHK, COS-?. 293, HepG2, NIH3T3, RIN and MOCK cells. In addition, a host cell 
5 strain may be chosen that modulates the expression of the inserted sequences, o- modrfies and process 
,he Bene product in the manner desired. Such modifications le.j., glycosylationi and processing 
cleavage! of protern products may be mrportant for the function of the protein. Different host cells have 
characteristic and specific mechanisms fo, the post translational processing and modification of proteins. 
Appropriate cell lines or host systems can be chosen to insure the correct modification and processing of 

1 0 the foreign protein expressed. 

A number of selection systems may be used .nciuding. but not limited to. HSV thymidine kinase, 
hypoxanthine-guanine phosphoribosyltransferase and adenine phosphoribosyltransferase genes, in 
hgpr, or aprt- cells, respectively. Also, anti-metabohte resistance can be used as the basis of selection 
for dhfr. that confers resistance X.-.gpt. that confers resistance to mycophenolic acid; that confers 
1 5 resistance to the aminoglycoside G418; and hygro, that confers resistance to hygromycin. 

Animal cells can be propagated /. vitro in two modes: as non-anchorage dependent cells growing 
in suspension throughout the bulk of the culture or as anchorage-dependent cells requiring attachment to 
a solid substrate for their propagation \i.e.. a monolayer type of cell growth). 

Non-anchorage dependent or suspension cultures from continuous established cell lines are the 
.0 most widely used means of large scale production of cells and cell products. However, suspension 
cultured calls have limitations, such as tumorigenic potential and lower protein production than adherent 
cells. 

Large scale suspension culture of mammalian cells in stirred tanks is a common method for 
production of recombinant proteins. Two suspension culture reactor designs are .n wide use • the stirred 
25 reactor and the airlift reactor. The stirred design has successfully been used on an 8000 to capacity 
for the production of interferon. Cells are grown m a stainless steel tank with a heiBht to-diameter ratio 
of 1-1 to 3:1. The culture is usually mixed with one or more agitators, based on bladed disks or marine 
propeller patterns. Agitator systems offering less shear forces than blades have been described. 
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Ag.tat.on may be dnven either directly or .nd.rectly by magr.et,caily coupled drives. Indirect dnves reduce 
the risk of microbial contamination through seals on stirrer shafts. 

The airlift reactor, also imtially described for microbial fermentation and later adapted for 
mammalian culture, relies on a gas stream to both mix and oxygenate the culture. The gas stream enters 
5 a riser section of the reactor and drives circulation. Gas disengages at the culture surface, causing 
denser liquid free of gas bubbles to travel downward in the downcomer section of the reactor. The mam 
advantage of this design is the simplicity and lack of need for mechanical mixmg. Typically, the height-to- 
diameter ratio is 10:1. The airlift reactor scales up relatively easily, has good mass transfer of gases and 
generates relatively low shear forces. 

10 N. Methods for Blocking Mutant HNFla.HNFip and HNF4a Action 

In another embodiment of the present invention, there ,s contemplated the method of blocking the 
function of mutated HNFIa in M0DY3, HNFip in M0DY4. and HNF4a ,n MODYl. In this way it may be 
possible to curtail the effects of the mutation m diabetes. In addition, it may prove effective to use this 
sort of therapeutic intervention in combination with more traditional diabetes therapies, such as the 
15 administration of insulin. 

The general form that this aspect of the invention will take is the provision, to a cell, of an agent 
that will inhibit mutated HNFla, HNFI3 or HNF4a function. Four such agents are contemplated. First, 
one may employ an antisensfr nucleic acid that will hybridize either to the mutated HNFla. HNFip or 
HNF4a gene or the mutated HNFla. HNFip or HNF4a gene transcript, thereby preventing transcription 
or translation, respectively. The considerations relevant to the design of antisense constructs have been 
presented above. Second, one may utilize a mutated HNFIa-. HNFip- or HNF4a.binding protein or 
peptide, for example, a peptidomimetic or an antibody that binds immunologically to a mutated HNFIa, 
HNFip or HNF4a respectively, the binding of either will block or reduce the activity of the mutated 
HNFIa. HNFIP and HNF4a respectively. The methods of making and selecting peptide binding partners 
and antibodies are well known to those of skill in the art. Third, one may provide to the cell an antagonist 
of mutated HNFIa, HNFip or HNF4a, for example, the transactivation target sequence, alone or coupled 
to another agent. And fourth, one may provide an agent that binds to the mutated HNFIa, HNFip or 
HNF4a target without the same functional result as would anse with mutated HNFIa, HNFip or HNF4a 
binding. 
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P,.,i.ion 0) an HNf la. HNFl^r HNF4a ^ne. a mutated HNFla, HNFip HNF4a p-o«,n, o, 
, ™,ated HNFla, HNFip o- HNF4a antagonist, woald be aecotdins to any app,o,nste pha,maceut,cal 
,„„,e The lotmalation of such composifon. and the,, delivery to „ssues is discussed bel.*. The method 
by which the nucleic acid, p.otein o, chemical is t.ansletred, along with the prefened delivery route, w.ll be 
5 selected based on the particular site ,o be treated. Those of skill in Ibe ar, are capable .1 determ,n,ng the 
most appropriate methods based on the relevant clinical considerations. 

Many of the gene transfer techniques that generally are applied in mo can be adapted for e. 
,iyc or in vivo use. Fo, example, selected organs including the Sver, skin, and muscle tissue of rats and 
micehavebee„bon*a,ded«»»(Yang«./., ,990; Zelenin././., .991). Naked ONA also has been used 
,0 in clinical settrngs to effect gene therapy. These approaches may re,uire surgical exposure of the target 
tissue 0, direct target tissue injection. Nicolau « 119871 accomplished successful l^osome-meiated 
gene transfer in rats after intravenous injection. 

Oubensky et al. (1984) successfully injected polyomavirus DNA in the form of CePO. precip-tates 
,„,o liver and spleen o. adult and newborn mice demonstrating active viral replication and acute infection. 
,5 Benvenisty and Neshif (19861 also demonstrated that direct intraperitoneal injection of CaPO. prec,p,taled 

■ . »< .h. trantfnr^iBil oencs Thus. it is envisionod that ONA encodmg an 
plasmids results m expression ot the transtecteo genes, luiji., 

antisense construct also may be transferred in a similar manner in vivo. 

Where the embodiment involves the use if an antibody that rec.gni:es a mutated HNFla, HNFip 
0, HNF4a polypeptide, coosidera.ion must be given to the mechanism by which the antibody is introduced 

20 into the cell cytoplasm. This can be accomplished, for example, by providing an expression construct that 
encodes a single-cbain antibody version of the antibody to be provided. Most of the discussion above 
relating to expression constructs for antisense versions ol HNFla. HNFip o, HNF4a genes will be relevant 
,0 this aspect of the invention. Alternatively, it is possible to present a bifunctional antibody, where one 
antigen binding arm of the antibody recognizes an HNFla. HNFip or HNF4a polypeptide and the other 

25 antigen binding arm recognites a receptor on the surface ef the cell to be targeted. Examples of suitable 
receptors would be an HSV glycoprotein such as gB. gC. gD, or ,H. In addition, it may be possible to exploit 
the Fc binding function associated with HSV gE. lh«ebv obviating the need to sacrifice one am, of the 

antibody for purposes of cell targeting. 

AdvantageouslY.one may combine this approach with more conventional diabetes therapy options. 
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0 Pharmaceuticals and In vivo Methods for the Treatment of Disease 

Aqueous pharmaceutical compositions of the present invention will have an effective amount of 
an HNFIa, HNFip or HNF4a expression construct, an antisense HIMFla, HNFip or HNF4a expression 
construct, an expression construct that encodes a therapeutic gene along with HNFla. HNFip or 
HNF4a, a protein or compound that inhibits mutated HNFIa. HNFip or HNF4a function respectively, 
such as an anti-mutant HNFIa antibody, an anti-mutant HNFip antibody or an anti-mutant HNF4cx 
antibody, or a mutated HNFIa polypeptide, mutated HNFip polypeptide or a mutated HNF4a 
polypeptide. Such compositions generally will be dissolved or dispersed in a pharmaceuticaliy acceptable 
carrier or aqueous medium. An "effective amount," for the purposes of therapy, ,s defined at that amount 
that causes a clinically measurable difference in the condition of the subject. This amount will vary 
depending on the substance, the condition of the patient, the type of treatment, the location of the lesion, 
etc. 

The phrases "pharmaceutically or pharmacologically acceptable" refer to molecular entities and 
compositions that do not produce an adverse, allergic or other untoward reaction when administered to 
an animal, or human, as appropriate. As used herein, "pharmaceutically acceptable carrier" includes any 
and all solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption 
delaying agents and the like. The use of such media and agents for pharmaceutically active substances is 
well known in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredients, its use in the therapeutic compositions is contemplated. Supplementary active ingredients, 
such as other anti-diabetic agents, can also be incorporated into the compositions. 

In addition to the compounds formulated for parenteral administration, such as those for 
intravenous or intramuscular injection, other pharmaceutically acceptable forms include, e.g., tablets or 
other solids for oral administration; time release capsules; and any other form currently used, including 
cremes, lotions, mouthwashes, inhalants and the like. 

The active compounds of the present invention will often be formulated for parenteral 
administration, e.g.. formulated for injection via the intravenous, intramuscular, subcutaneous, or even 
intraperitoneal routes. The preparation of an aqueous composition that contains mutated HNFIa, HNFip 
or HNF4a inhibitory compounds alone or in combination with a conventional diabetes therapy agents as 
active ingredients will be known to those of skill in the art in light of the present disclosure Typically, 
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such composilions be prepared as ia,.c.ables, ai,he, as l,c,uid solutions o, suspensions; solid forms 
su„able fo, using ,o prepare solutions or suspens,..,s upon ,he addition of a liqurd prior to injecon can 
also be prepared: and the preparations can also be emulsified. 

solutions of the active compounds as free base or phamtacologically acceptable salts can be 
prepared in water suitably mi.ed with a surfactartt, such as Kydroxvpropylcellulose. Dispersions can also 
be prepared in glycerol, liquid polyethylene Qlyds, end mixtures thereof and in oils. Under ordrnary 
conditions of storage and use, these preparations contain a preservative to prevent the growth of 
microorganisms. 

The pharmaceutical fomts suitable for iniectable use include sterile agueous solutrons or 
drspersiorrs, formulations including sesame oil, pea™, oil or agueous propylene glycol: and sterile powders 
,or the extemporaneous preparation of sterte in|eotable solutions or drspersions. In many cases, the forr. 
™st be sterile and must be fluid to the extent that easy synngaWity exists. It must be stable under the 
conditions of manufacture and storage and must be preserved against the contaminating act.on of 
microorganisms, such as bacteria and fungi. 

The active compounds may be formulated into a composition in a neutral or salt fomt. 
Pharmaceutically acceptable salts, include the acid addition salts llormed with the free amino groups of 
the protein) and which are formed with inorganic acids such as, for example, hydrochloric or phosphorrc 
acids or such organic acids as acetic, oxalic, tartaric, mandelic, and the Bke. Salts formed with the free 
carboxyl groups can also be derived from Inorganic bases such as, for exan^le, sod»n, potassium, 
an^onium, calcium, or ferric hydroxides, and such organic bases as Isopropylamine, trimethylam.ne, 

histidine, procaine and the like. 

The carrier also can be a solvent or dispersion medium containing, for example, water, ethanol, 
polyo, (.0, example, glycarol, propylene glycol, and Sguld polyethylene glycol, and the likel, surtable 
mixtures .hereof, and vegetable oils. The proper fluidity can be maintained, to, example, by the use of a 
, coating such as lecithin, by .he mam.enance of .he required particle size in the case of dispersion and by 
the use of surfac.an.s. The preven.ion of .he action of mc-oorganisms can be brought about by various 
antibacterial and an.if^gal agents, for example, parabens, chtaroManol, phenof, sorbic acid, thrmerosal. 
and .he like. In many cases, i, will be preferable to incVde isotonic agents, for example, sugars or sodrum 
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chlonde. Prolonged absorption of the mjectable compos,t,ons can be brought about by the use .n the 
composmons of agents delaying absorption, for example, aluminum monostearate and gelatin. 

Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as required 
followed by filtered sterilization. Generally, dispersions are prepared by incorporating the various 
sterilized active ingredients into a sterile vehicle which contains the basic dispersion medium and the 
required other ingredients from those enumerated above. In the case of sterile powders for the 
preparation of sterile injectable solutions, the preferred methods of preparation are vacuum-drying and 
freeze-drying techniques which yield a powder of the active ingredient plus any additional desired 
ingredient from a previously sterile-filtered solution thereof. 

Upon formulation, solutions will be administered m a manner compatible with the dosage 
formulation and m such amount as is therapeutically effective. The formulations are easily administered 
■n a variety of dosage forms, such as the type of injectable solutions described above, with even drug 
release capsules and the like being employable. 

For parenteral admimstration m an aqueous solution, for example, the solution should be suitably 
buffered if necessary and the liquid diluent first rendered isotonic with sufficient saline or glucose. These 
particular aqueous solutions are especially suitable for intravenous, intramuscular, subcutaneous and 
.ntraperitoneal administration. In this connection, sterile aqueous media which can be employed will be 
known to those of skill in the art in light of the present disclosure. For example, one dosage could be 
dissolved in 1 mL of isotonic f^aCI solution and either added to 1000 mL of hypodermoclysis fluid or 
injected at the proposed site of infusion, (see for example. "Remington's Phamiaceutical Sciences" 15th 
Edition, pages 1035-1038 and 1570-1580). Some variation in dosage will necessarily occur depending on 
the condition of the subject being treated. The person responsible for administration will, in any event, 
determine the appropriate dose for the individual subject. 
25 P. Examples 

The following examples are included to demonstrate preferred embodiments of the invention It 
should be appreciated by those of skill in the art that the techniques disclosed in the examples which 
follow represent techniques discovered by the inventor to function well ,n the practice of the invention, 
and thus can be considered to constitute preferred modes for its practice However, those of skill m the 
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„, should, . m 0. .he P-esen, *sdosu,a, appreciate ,ha. n,a„v changes ca„ be .ade In ,he specfc 
e.bodi«n,s which a,e disclosed and s,il, obtain a like o, sWIa, ,esul, wnhoo, depa,„ng iron, .he sp,n, 
and scope of the invention. 

EXAMPLE 1 

plasnta glucose concen.,a,ion and insulin secretion rate (,SR, can be identified . snbiects who ha.e 
inherited an at rislr M0DY3 allele but who have not vet developed overt diabetes. 

10 1. Methods 

Suli/ects from U0BY3 pedigrees 
Thirteen Caucasian subiec.s who were positive for M0DY3 markers on chromosome 12, were 
studred TWO subiects were members o, a French pedigree ,V,x,l,aire « a,..,995i. three were trom 
,he P pedigrse trom Michrgan Wenze, e, ./., 1995,. two .r«n a New York pedigree the H ped„ree 
,5 deprcted in F.G. 1, two were from a Uverpooi pedigree, the BDA, pedigree and .our trom a Nottrngham 
pedigree, the BDA12 pedigree .FIG. ... Each subject was typed with a series o. DNA markers .n the 
.g,on 0. MO0Y3 to determine whether o, not thev had inherited the at-rrsk haplotype segregatrng wr. 
MODY in that .ami,,. The diabetes status o. each subiec. except .or MD13, had been determined bv oral 
giucose ,o,era„ce ,es«n, (OGTT, according to the World Heaitb Organization ,WHO, criteria ,WHO Study 
,0 Group on Diabetes Melii.us, 1985. and con.irmed 3, the time o, the studies by «,e measurement of 
glycosylated hemoglobin. Based on the results the OGTT and glycosylated hemoglobin values w„h,„ or 
above the nonnai range .or the inventors' laboratory «7.4%, subiects were divided into drabetrc and 

nondiabetic groups. 

Nondiabetic M0DY3 subjects (n=6). 
25 The clinical pro.»es .. these subiects are described in Table 4. Al, had normal fasting glucose and 

glycosylated hemoglobht l<7.4%, levels at the time of this study. At the t,me o. study 4 sub,ec.s had 
,GT ,M0, MD4. HD9, MD13I and 2 subiects had n.™na, glucose tolerance ,NGT, .MD3, MD5I. Based on 
provioos glucose tolerance testing MO. had IGT, MD3 consistently demonstrated NGT on serral OGTTs. 
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M04 was .n, ,GT ,„ 6,93 and has pers,s,em IGT a 2 h p„s,p,a„d,a. btaad gluccsa le.e, 

Of -47 ,.,,1, MD5 .as i„,„allv d.agnosed .„h IGT and ..bse,uen,l, had 2 „o,™i OGTTs w,th 2.h 
blood „.ocse values ,30 .g/d, and ,05 .g,d,, respeC.elv, M09 had ,GT, w„n a 2 h pos. challenpe 
blood gipcose le.e, was ,67 .g,dl w„h no o,b« blood gipcsa le.a, abova 200 .„dl and MD,3 had IGT 
w..h a,eva,od p„s,p,a„dia, blood gl„coss levels ,n ,he pas, up ,o ,60 h,g,dL Age o, d,ag„os,s .,ars ,„ 
.He age a, «h,ch abnormal glucose ,o,e,ance was diagnosed. None „, these sub,ec,s we,e eve, 
diagnosed with NIDDM. 

Diabetic M0DY3 subjects (n=7). 
Clinioa, profiles are shown in Table 4. All subjecs had been „ea,ed wi.b oral hypoglycamrc 
agenrs e.cep, ,or MD8 who was raking i„su„„ which was disconr.nued ,wo days pr.or .o rhe s.udy and 
MD,2 Who was .reared w„h d,e, alone. Al, sub,ec,s had discon.rnued ,rea,.enr wi.h oral hypoglycemrc 
agenrs a, leas, three weeks prror ,o berng s.udred. As shown ,n Table 4, ,as„„g plas™ glucose and ,„,al 
glycosy,a,ed hanroglobin levels were h.gher ,n rhe diabe.ic group and ,as„„g ,„s„,i„ levels were lower 
The d,abel,c group was also significanrly older than ihe other two groups. 
Nondiabetic controls. 

The con.,0, subiects consisted o, 5 ntales and one female ,5 Caucas.ans and , African Amencan, 
Who d,d no, have a personal or famrly history of NIDDM. They were all within 20% o, ideal body weigh, 
had no medrcal illnesses and were no, receiving any .edica.inns. Da,a fron, four of the control sub,ects 
have previously been published (Byrne er a/.. ,994; Byrne « ./., ,995a,. BM, was no, signif.cantly 
differen, belweer the control and diabetic or nondiabetic M0DY3 groups. 

female volunteers had regular menstrual cycles and were stud.ed only ,n ,he early follicular 
Phase. The study was approved by ,he ,ns,i,u,lona, Review Board pf ,he Universr.y Chicago Medical 
Center and all subjects and/or parents provided written informed consent. 
Experimental protocol 
studies began at 0800 h with subjecs in the recumben, posit.on after , ,2-^ ..ernigh. „s. An 
intravenous catheter was placed ,n each forearm, one fnr b,ood sampling and one ,o, glucose 
adm,n,s,ra„on. In all experiments, the arm containing the sampling cethcer was maintained in a hea.rng . 
blanker or hot hand boi to ensure arteriaiization of Ihe venous sample. 
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Graded glucose infusion studies. 
These studies were designed to characterize the dose-response relationships between glucose 
and ,nsul.n secretion rate (ISR). In order to eliminate potentially confounding effects of differences in the 
basal glucose concentration, each study began with the administration of a small bolus of insulin 

5 intravenously (0.007 Ulkg) followed by a low dose continuous infusion of insulin to lower the fasting 
plasma glucose to similar levels in all groups (target plasma glucose - 5 mM). After a period of 20 mm 
during which time the exogenously admimstered insulin was allowed to decay, samples were drawn at 10 
min intervals for 30 min to define baseline insulin, glucose and C-peptide levels. An intravenous infusion 
of 20% dextrose was then started at a rate of 1 mg/kg/min, followed by infusions of 2 mglkg/min, 3 

10 mglkg/min, 4 mglkglmin. 6 mg/kglmin and 8 mgfkglmin. Each infusion rate was administered for a period 
of 40 min. Insulin. C-peptide and glucose concentrations were measured at 10. 20. 30 and 40 mm into 
each infusion period. 

Effects of prolonged intravenous glucose administration on insulin secretory responses to 

graded glucose infusions. 

, 5 A, .he comptetion of the graded glucose infusion study desotibed above, glucose was infused 

intravenously for a 42-h period at a rate of 4 6 mglkglmin in order to determine if the insulin secretory 
responses to glucose could be primed by exposure to mild hyperglycemia. Subjects also consunted three 
cartDohydrate en.itl«d meals during the second da, of this glucose infusion. At the conclusion of the 42- 
h Infusion period, the infusion tale was reduced over a 60 min penod and then stopped. Thirty minutes 

20 later, the graded glucose infusion study was repeated. Plasma glucose levels were obtained every four 
hours during the 42-h glucose infusion. 
Assays. 

Plasma glucose was measured by the glucose oxidase technique (YSI analyzer, Yellow Springs. 
OH). The coefficient of variation of this method is <2%. Serum insulin was assayed by a double 
25 antibody technique (Morgan and Lazarow, 1963). The average intra-assay coefficient of variation was 
6%. Plasma C-peptide was measured as previously described (Faber ./.. 1978). The lower limit of 
sensitivity of the assay was 0.02 pmol/ml and the intra-assay coefficient of variation averaged 6%. All 
samples were measured in duplicate. Assays were performed at the University of Chicago. 
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Data analysis 

Estimation of ISRs. ISRs «,ere denved by deconvolution of plasma C-pept.de concentrations 
assuming a two-compartmental model of C peptide clearance kmet,cs (Van Cauter et al.. 1992; Eaton et 
a/.. 1980; Polonsky £>/ j/., 1986). 

Relationship between glucose and ISRs. 

The relationship between plasma glucose and ISR was explored m each md.v.dual by analyzing 
the data from the graded glucose infusion studies. Baseline glucose, .nsulin. C peptide and ISRs were 
calculated as the man of the values in the -30, -20, -10 and 0 min samples. Dunng each glucose infusion 
period, average glucose and ISRs were calculated. Mean ISRs for each period were then plotted against 
the corresponding mean glucose level, thereby establishing a dose response relat.onship between glucose 
and ISR. Mean ISRs were determined for 1 mM glucose concentration intervals by calculating the area 
under the curve for each interval using the trapezoidal rule. Th.s area was div.ded by 1 mM to obtain the 
correct units (pmol/min). 
Statistical analyses 

All results are expressed as mean . SEM. Data analysis was performed using the Statistical 
Analysis System (SAS Versior, 6 Edition for Personal Computers, SAS Institute, Inc., Cary, NC). The 
significance of differences between the groups was determined using paired or unpaired r-tests or 
analysis of variance where appropriate. Tukey's studentized range test was used for post hoc 
comparisons. Pearson's correlation coefficient was used to evaluate correlations between pairs of 
parameters. 

2. Results 

Glucose, insulin and ISR during graded intravenous glucose infusion 
Fasting plasma glucose levels were higher in the M0DY3 diabetic group compared to the 
nondiabetic group or controls (7.5±0.7 mM vs. 4.5t0.2 mM and 4.7.0.2, respectively; ^>0.0OO8). 
The corresponding fasting plasma insulin levels were lower in the diabetic M0DY3 group compared to 
nondiabetics and controls (Table 4). Glucose, insulin and ISR responses to the glucose infusions are 
shown m FIG. 2A, FIG. 2B and FIG. 2C, respectively. Average glucose concentrations over the duration of 
the study were higher in the diabetic M0DY3 subjects compared to the nondiabetic M0DY3 and control 
subjects (8.5±0.4 mM vs. 6.3.0.3 mM and 64.0.2; P< 0.0002) (FIG. 2A). Average insulin levels were 
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,.we, ,n >he diabe,,. and nondiabstic M0DY3 groups than in .be controls .57.4*8.2 pmollL and 
798.11 0 vs 139.3.14.7 prttolfL; ?<0.O0061 IFIG. 2B). Average ISR s were srgnifrcantly lower in 
diabetic compared to .be nondiabetic M0DY3 subjects and ,be controls (116,18.8 p»l„.,n .s. 
1 79.7 ± 1 9.9 pmollrnin and 1995 ± 1 8.7; />< 0.02MFIG.2C). 
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TABLE 5 



ID 



Non-diabetic MDDY3 



MD1 



MD3 



MD4 



MD5 



MD9 



MD13 



MEAN 



Diabetic M0DY3 



MD2 



MD6 



MD7 



MD8 



M10 



Mil 



M12 



MEAN 



Controls 



C05 



C07 



C09 



C12 



C13 



CIS 



MEAN 
p value 



Insulin Secreted between 5 and 9 mM glucose 



Baseline 



188.1 
164.5 



136.6 



297.5 



249.1 



248.1 



214.3±24.8 



67.4 
131.5 



144.6 



156.6 



63.7 



38.2 



102.6 



100.8±17.3' 



318.1 



209.5 



166.9 



235.6 



215.6 



120.1 



211±27 



p < 0.004 



Post-glucose 



221.6 



255 



208.3 



342.5 



292.1 



234.2 



259±20.6 



68.9 



109.1 
85.2 



189.3 
34.9 



28.4 



115.1 



90.0 ±20.8* 



356.8 



272.1 



223.1 



381.6 



306.5 



180.5 



287 ±32 



P< 0.002 



Priming effect % 



17.9 



55 



52.5 



15.1 



34.5 



-5.9 



35±8 



2.2 



•17 



-41 



20.9 



-45 



-26 



12.2 



■13.4±9.8' 



12.2 



29.2 



33.7 



62.0 



42.2 



50.3 



38±7 



p< 0.009 



The amount of insulin secreted as glucose was raised from 5 to 9 mM in study subjec s 
et e and after a priming intravenous infusion of glucose. Asterisks refer to stat.st.ca y 
s-^nXant differences between the diabetic subjects and those .n the other two groups 
using Tukey's studentized range test for post-hoc compansons. 
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Changes in insulin sensitivity 

Insulm resistance esfmated by the Homeostas.s Model Assessment Method (HOMA) (Matthews 
et aL 1985) failed to demonstrate significant differences between the groups (diabet.c M0Dy3: 
1.9±D.2;nondiabeticMODY3; 1.7 ±0.3; controls: 2.4±0.2; /'-O.l 1). 
Dose response relationship between glucose and ISR 

The ISR in the three groops was compared at the same plasma glucose level by plotting the mean 
ISR at each glucose infusion rate agamst the corresponding mean glucose level. The resulting glucose lSR 
dose-response relationships are shown in FIG. 3. Over the 5 9 mM glucose concentration interval the 
d.abet.c M0DY3 group secreted significantly less insul.n than subjects in the nondiabetic M0DY3 and 
control groups (101.17 pmol/min vs. 214.25 pmol/min and 211.27 pmoi/min. respectively; 
^< 0.004). The mean insulin secretion rate did not differ between these latter two groups. 

The dose response curves (FIG. 3) indicate that the insulin secretion rates were similar ,n 
nondiabetic MODY sub.ects and controls at lower glucose concentrations. The amount of insul.n secreted 
as the glucose concentration was increased from 5-7 mM was similar in these two groups (180* 19 vs 
160.17 pmol/min; ^-0.45). Over the 7-8 mM glucose mterval the nondiabetic M0DY3 subjects 
secreted 243.5.31.5 pmol/min compared to 284.7.30.5 pmol/min in controls /'-0.37. From 8-9 mM 
glucose they secreted 257.1 .35.0 pmol/min compared to 354.0.43.4 pmcl/min in controls ^-012 (FIG. 
3). As the glucose concentration was increased from 7-8 mM to 8 9 mM the increase in insulin secretion 
rate in the nondiabetic M0DY3 subjects was significantly less than in the controls (37.3.13.5 vs. 
20 75.7.9.5 pmol/min; 0.05). 

Effect oflowdose glucose infusion on relationships between glucose and ISR 

Mean glucose levels achieved during the 42.h constant glucose infusion were significantly higher 
m the diabetic compared to the nondiabetic M0DY3 group and controls (14.9.0.6 mM vs. 10.0. 1 4 mM 
vs. 6.6.0.3 mM; P< 0.0001). The glucose infusion was discontmued after 42.h and low dose insulin 
^5 was administered resulting m a fall in the plasma glucose concentration to similar levels ,n the two 
groups. The graded intravenous glucose infusion study was then repeated in each subject. 

In order to quantify the priming effect of glucose on msulin secretion, the average ISR measured 
during each glucose infusion rate was plotted against the average plasma glucose concentration and 
compared with values obtained before glucose infusion. Over the glucose concentration range between 5 
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,„d 9 m glucose, oon„ol subjects seeded 21U27 pmol/m,n bOoce and 287.32 pn,oi;min 
,/.<0 0051 insulin af,e, glucose infus.on (FIG, 4AI, The.e *as a shift in the glucose-ISR does-tesponse 
curves upwards and to the leU. with ISR inceasins by 38.7%, The nondiabetic M0DY3 group increased 
their ISR from 214.25 pnrollmin to 259.21 pmol/ntin |/'<0,03l (FIG, 4B). The diabetic MG0Y3 group 
had a small and non significant 13.10% decrease in ISR after glucose administration (101 . 17 pmolimm 
,0 90.21 pmollmin; P>m (FIG. 4CI, Individual values for ISR from 5 9 ml^ glucose before and after 
low dose glucose infusion are given in Table 5. 

response to glucose 

There was a significant negative correlation between glycosylated hemoglobin and percem 
priming (r - -0.78; /'<0.002) and between glycosylated hemoglobin and ISR from 5-9 mM glucose (r = - 
0.61; P<^m By contrast there was no significant decrease in ISR as glucose concentrations rose 
from 7-8 to 8 9 m with increasing glycosylated hemoglobin levels (r - -0.07: /'-0.82). 
3. Discussion 

Basal glucose levels v,e,e higher and insulin levels were lower in M0DY3 subjects with diabetes 
compared to nondiabetic subjects or norntal healthy controls. In respor,se to the ,raded glucose infusion, 
insulin secretion rates were significantly lower in the diabetic subjects over a broad range of glucose 
concentrations. Insulin secretion rates in the nondiabetic M0DY3 subjects were not s,9n.f,cantly 
different from the controls at plasma tevels <8 mM. As glucose rose above this level, however, the 
increase in insuVm secretion is these subjects was significantly reduced. Administration of glucose by 
mtravenous infusion for 42.h resulted in a significant increase in the amount of insulin secreted over the 
5 9 mM Jiucose concentration range in the controls and nondiabetic M0DY3 subjects (by 38% and 36%, 
respectively) but no significant change was observed in the diabetic M0DY3 subjects. In conclusion, m 
,K,ndiabetic M0DY3 subjects insulin secretion demonstrates a diminished ability to respond when blood 
glucose exceeds 8 mM. The priming eHect of glucose on insulin secretion is preserved. Thus, p-cell 
dysfunctto,, is present prior to the onset of overt hyperglycemia in this form of MODY, The defect in 
insulin secretion in the nondiabetic M0DY3 subjects differ from than reponed previously in nond.abet.c 
MOOYl or mildly diabetic M0DY2 subjects. 
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EXAMPLE 2 

Mutations in HIMFIa Relating to M0DY3 Type Diabetes 
1- Materials and Methods 

Isolation of partial sequence of the human HNFIct gene. 

The PAC clone, 254A7. containing the human HNFla gene was isolated from a l.brary (Genome 
Systems. St. Louis, MO) by screening PAC DNA pools w.th PGR and the primers HNF1P1 (S"- 
TACACCACTCTGGCAGCCACACT.3- SEQ ID N0:10) and HNF1P2 (B'-CGGTGGGTACATTGGTGACAGAAC- 
3' SEQ ID NO:!]). The sequences of the exons and flanking introns were determined after subclomng 
fragments of the 254A7 into pGEM.4Z (Promega B.otec, Madison, Wl) or pBluescnpt SK. (Stratagene. 
La Jolla. CA) and sequencing usmg pnmers based on the sequence of the human HNFla cDNA (Bach et 
aL 1990; and Bach and Yaniv. 1993) and selected using the conserved exon-mtron organization of the 
mouse and rat genes (Bach er ai. 1992) as a guide. Sequencing was carried using a AmpliTaq FS Dye 
Termmator Cycle Sequenmg Kit (ABI. Foster City. CA) on an ABI Prism- 377 DNA Sequencer (ABI) The 
sequences of the exon 2/intron 2. exon 3/intron 3. intron 6/exon 7. and intron 8/exon 9/intron 9 junctions 
were determined by directly sequencing PGR products generated by amplification of PAC 254A7 or 
human genomic DNA. FIG. 1 1 shows the cDNA sequence of HNFla. 

Screening of HNFla gene for mutations. 
The ten exons and flanking introns of the HNFla gene of an affected subject from families ,n which of 
MODY cosegregated with markers spanning the M0DY3 region of chromsome 12 subjects w.th the 
M0DY3-forTn of NIDDM were amplified using PCR and specific primers (Table 6). PGR conditions were 
denaturation at 94°C for 5 min following by 35 cycles of denaturation at 94°C for 30 sec. annealing at 
62°C for 30 sec (except for exon 9 - annealing temperature was 60»C) and extension at 72°C for 45 sec 
and final extension at 72°C for 10 min. The PCR products were purified using a Centricon-lOO membrane 
(Amicon, Beverly. MA) and sequenced from both ends using the primers shown in Table 6, a AmpliTaq FS 
Dye Temiinator Cycle Sequencing Ki, and ABI Prism™ 377 DNA Sequencer. The presence of the specific 
mutation in other family members was assessed hy amplifying and 6.ecx\y sequencing the appropriate 
exon. At least 40 normal unrelated healthy non-diabetic non Hispanic white subjects (80 chromosomes) 
were also similarly screened. DNA polymorphisms identified dunng the course of screening patients for 
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™„„„„s v.e,e characterized bv PCR and dKec, scuenc,™,, o. diges„o„ ™,.h an appropna.s rest.iccn 
endonuclease and gel electrophoresis. 
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Identifies the DNA polymorphisms identified in the coding region of HNFIa gene. Of 
course these are exemplary polymorphisms and those of skill in the art will easily be able to employ the 
methods and descriptions set forth in the present invention to identify other polymorphisms. 

Table 7. 

DNA polymorphisms identified in coding region of human HNFIa gene 



Exon 


Codon 


Nucleotide change 


Frequency 


1 


17 


CTC(Leu)->-CTG (Leu) 


C, 0.57; G, 0.43 


1 


27 


ATC(lle|->CTC (Leu) 


A, 0.63; C, 0.37 


1 


98 


CCC(Ala)-^GTC (Val) 


C, 0.98; T,0.02 


4 


279 


GGG(Gly)-^GGC (Gly) 


G, 0.69; C, 0.31 


7 


459 


CTG(Leu)->TTG (Leu) 


C, 0.63;T,0.37 


7 


487 


AGC(Ser|->-AAC (Asn) 


G, 0.68; C, 0.32 


8 


515 


ACG(Thr)-)-ACA(Thr) 


G, 0.79; A, 0.21 


Intron 1 


nt-91 


A-^G 


A, 0.88; G, 0.12 


Intron 1 


nt-42 


G-^A 


G, 0.66; A, 0.34 


Intron 2 


nt51 


T^A 


T,0.85; A, 0.15 


Intron 2 


nt-23 


C-»T 


C, 0.88; T, 0.12 


Intron 5 


nt-47 


C-^T 


C, 0.99; T. 0.01 


Intron 7 


nt-7 


G->A 


G, 0.57; A. 0.43 


Intron 9 


nt-44 


C-^T 


C, 0.95; T. 0.04 


Intron 9 


nt-24 


T-)-C 


T, 0.59; C, 0.41 



Table 8 shows a summary of mutations identified in human HNFIa in patients with M0DY3. 
Sixteen exemplary mutations are identified in the HNF-1a gene in M0DY3 patients but were not present in 
unaffected ,nd.viduals.these mutations mclude f rameshifts in exons 1. 4. 6, and 9. missense coding in exons 
2. and 7 as well as abnromal splicing in introns 5 and 9. The results described herein demonstrate that 
mutations in this transcription factor can cause diabetes mellitus and focuses attention on the role of HNF- 
1 a in determining normal pancreatic (i cell function. 
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3. Discussion 

Linkage analysis localized MODYS to a 10 cM interval of chromosome 12 between the markers 
D12S86 and D12S342 (Vaxillaire etal., 1995) and then to a 5 cM .nterval between the markers D12S86 
and D12S807ID12S820 (Menzel. S. et al. 1995). A comb.ned YAC, BAC and PAC contig spanning 

5 D12S86 and D12S807 (FIG. 9) was generated usmg information in public databases jChumakov et al. 
1995; Hudson st al. 1995) and screening appropriate libraries (YAC and BAC, Research Genetics, 
Huntsvilie, Alabama; and PAC, Genome Systems, St. Louis. M.ssouri) with STSs from the M00Y3 region. 
The physical map allowed localization of new polymorphisms as they were reported as well as to 
generate new markers to further localize recombination events in key individuals. Such studies refined the 

,0 localization of M0DY3 to the 3 cM interval between D12S1666 and the polymorphic STS UC-39. 
Fluorescence In situ chromosomal hybridization using the BAC 162B15 mapped the contig to chromosome 
hand 12q24.2. 

This combination of genetic and physical mapping information was used to begin a systematic 
search for M0DY3. Using a combination of approaches including testing genes known to be on the long 

1 5 arm of chromosome 1 2 to see if they mapped into the contig. exon-trapping (Church, et al. 1 994). and 
cDNA selection (Kaplan et al., 1992) using human pancreatic islet cDNA (clinical stud-es had shown that 
insulin secretion was abnormal in M0DY3 patients, and thus islets were a likely site of expression of 
M0DY3 mRNA and protein), the inventors identified 14 genes encoding known proteins (y-subunit of 
AMP.activated protein kinase, citron, the GTP-binding protein H-ray, paxillin. acidic ribosomal 

20 phosphoprotem PO. pancreatic phospholipase A2. splicing factor SRp30. cyctochrome C oxidase subunit 
Via. short Cham acyl CoA dehydrogenase, HNF-la. thyroid receptor interactor (TRIP14) prote.n. 
Ca^'lcalmodulindependent protein kinase. P,,. purinoceptor and restin), 5 pseudogenes 
(metallopanstimulin-like, cell surface heparin binding protein-like. ribosomal protein L12-like, nucleoside 
diphosphate kinase-like and ADP rihosylation factor-like). 12 ESTs (yq81d09. yd50d03. IB383. hbc3028. 

25 yu36hQ5. yn75d09. yz51b06. ydSBgO?, ym03h09, ym30e05, Wl-6178/c.01h06. Wl.6239|c 04bl2) and 

9 unknown genes IFIG. 9). 

These genes were being systematically sequenced in affected and unaffected subjects using 
nested PCR and illegitimate transcription of lymphoblastoid RNA (Kaplan et al.. 1992), as well as PCR of 
individual exons of the gene. Comparison of the sequences of the pancreatic phospholipase A2, y-subunit 
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of AMPacnvated prote.n kinase, H-ray. cytochrome C ox.dase subumt VIA, aad.c r.bosomal 
Phosphoprotem PO, paxill.n, splicng factor SRp30. short cha.r, acyl CoA dehydrogenase. ar,d P,, 
punnoceptor genes fror. patients and controls revealed a number of polymorphisms but no MODY^ 
associated mutations. 

The HNF.la gene wes localized in the interval containing M0Dr3 using PCR and HNF-la gene 
specific primers ,FIG. 9). HNF-la oDNAs were also isolated a, high frequency by cDNA selection fro. 
htman pancreatic isle, cONA os.ng PAC 254A7, a resol, consistent »,th the report of Emens 1,992, 
showing that HNF ,a was e.pressed in hamster ,nsu,lnoma cells and functioned as a weak transacivator 
of the ra, ,„so,i„ , gene. The human HNFOa gene was isolated and partially sequenced to ptoyide the 
e-on-,ntron organization and the segoences of introns „.m which primers could be selected for PCR The 
human gene consists 0, ,0 ex.ns with introns , 8 located in the same positions as in the rat and mouse 
Benes ,Bach ,992,. Intron 9 interrupts cod.n 590 (phase „ and ,s not present in the ra, and mouse 
gar.es bu, does occur ,„ ,he chicken gen. (Hotlein « ./., ,9931 consis,en, wi,h loss 0, this Intron during 
.he period when humans and rodents shared their las, common ancestor. Ampfification and direct 
seguencing 0, axon 4 of subject EA, (Edinburgh pedigree, FIG. 5A, shewed an msertion of a C in codon 
289 (Pro, resultmg in a frameshift and premature termination (designated P289fs,nsCl (FIG ,0) This 
mufation was present in all effected members and no unaffected members of this family. It was also not 
found on screenmg 55 healthy non-diabetic white subjects („0 chromosomes,. Hence i, was concluded 
that the HNF-la gene is M0DY3 and led the ,n.em.,s ,0 seguence the HNF ,. gene in o,har famllies in 
which NIDDM cosegregated with markers from ,he M0Dy3 region. 

Fiffeen addi,ional mu.ations were found (Table 8,, all of which co-segregated with NIDDM and 
did not occur in any 0, a, least 50 healthy non diabetic whi,e subjens. Howeyer, ,here were indiyiduals in 
several pedigrees (GK pedigree, „|.3; Be, pedigtee, and P pedigree, IV-B and IV-G, who had inherited 
the mutant chromosome (and at-risk chromosome ,2 haplotype, but who were non-diabetic or showed 

on,y evidence of impai,ed glucose in,ole,ance or diabe,es durm, pregnancy. These individuals w kely 

develop MDOM in the fulure. In addifion, one subjec, wi,h N,DDM did no, have ,he mulan, allele (Ber 
pedigree, 11-21. He was diagnosed wi,h NIDOM a, 65 years of age a, which ,ime he was mi,d,y obese with 
a body mass inde. of 27 kg(m' suggesting a diagnosis of ,a,e-onse, NIDDM rarbe, ,han MOOY Such 
heterogeneity «„thm f^ODV families has been noted previously (Bed et ,99,; Vionnet ,9921 and is due 
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to the high frequency of late-onset NIDDM wh.ch affects 10% or more of individuals over age 65 years 
(Kenny et aL 1995). In addition to the mutations listed in Table 8, three amino acid polymorphisms 
(1/L27. A/V98 and SIN487), four silent polymorphisms (in codons for 117. G288, L459 and T515) and 
seven polymorphisms in introns were found in the HNF-la gene (Tables 7 and 8). 
5 Sixteen different mutations in the HNF la gene were identified in patients with the M0DY3 form 

of diabetes. The splicing and frameshift mutations would be predicted to result in the expression of a 
truncated protein having at least amino acids 1-290 of the native protein. The missense mutations. 
R131Q and P447L. are of residues that are conserved in human, rat, mouse, hamster, chicken, Xenopus 
and salmon HNF-1a and the structurally-related transcription factor human HNF-lp suggesting that these 

10 residues are functionally important. 

HNF-la is one of a group of transcription factors expressed in liver that act together to confer 
tissue-specific expression of genes in this tissue (Tronche ./.. 1992; Bach et al.. 1990). It is also 
found in kidney, intestine, stomach and pancreas, including islets of Langerhans, and at low levels in 
spleen and testis suggesting that it plays a role in transcriptional regulation in these tissues as well. HNF- 

15 la is composed of three functional domains: an NH,-terminal dimerization domain (ammo acids 1-32), a 
DNA binding domain with POU-like and homeodomain-like motifs (amino acids 150 280) and a COOH- 
terminal transactivation domain (amino acids 281-631). The functional form of HNF-la is a dimer and 
HNF-la may form homodimers or heterodimers with the structurally-related protein HNF-lp (Mendel 
al., 1991) 

20 Pontoglio et al. (1996) have generated mice that lack HNF-la. Homozygous HNF-1a-def.cient 

animals failed to thrive and usually died around the time of weaning. They also suffered from 
phenylketonuria and renal tubular dysfunction. However, the homozygous HNF- la-deficient mice did not 
appear to be diabetic as they had normal blood glucose levels and a normal response to an intravenous 
bolus injection of glucose. The massive glucosuria in these animals though may have masked the presence 

25 of diabetes mellitus. The insulin secretory responses of heterozygous HNF-la-deficient mice, animals that 
may be most similar to human subjects with HNF-la mutations and MOOY, were not reported. In view of 
the present findings that mutations in the HNF-la gene causes early-onset NIDDM. more detailed 
evaluation of p-cell and liver function in HNF-la-deficient mice is indicated. 
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The mechanism by which mutations in the HNF-la gene when present on a single allele can cause 
diabetes is unclear however, it is possible that a partial deficiency of HNF Ia could lead to p-cell 
dysfunction and diabetes. Alternatively, mutations m HNF-la may cause diabetes by a dominant- 
negative mechanism (Herskowitz, 1987) by mterfenng with the function of wild-type HNF la and other 
proteins which act in concert with HNF-la to regulate transcription in the D-cell and/or liver. All of the 
HNF-la gene mutations identified to date would result m the synthesis of a mutant protein impaired m 
DNA binding or transactivation but not dimerization. These mutant proteins could fom, non-productive 
dimers with the product of the normal HNF-la allele or other proteins such as HNF-lp and thereby impair 
the normal function of HNF la. 

The inventors have previously shown that diabetes mellitus in the Zucker diabetic fatty rat a 
rodent model of obesity and NIDDM, is associated with decreased expression of a large number of p-cell 
genes including genes such as insulin whose expression is restricted to the p-cell as well as others with 
a much broader tissue distribution (Tokuyama. at al. 1995). Thus, it is believed that NIDDM is likely to be 
a disorder of transcription with genetic or acquired defects affecting key proteins that regulate 
transcription leading to p cell dysfunction and diabetes. 

EXAMPLE 3 

Mutations in HNF4a Relating to MODYl Type Diabetes 

The PAC clone, 1 14E13, 130B8, 207N8, containing the human HNF4a gene was isolated from a 
library (Genome Systems, St. Louis, MO) by screening PAC DNA pools with PGR and the primers HNF4P1 
{5'-CACCTGGTGATCACGTGGTC-3- SEQ ID N0:81) and HNF4P2 I5--GTAAGGCTCAAGTCATCTCC.3- SEQ 
ID N0:82). The sequences of the exons and flanking introns were detemiined by directly sequencing using 
primers based on the sequence of the human HNF4a cDNA (Chartier et al.. 1994; Drewes et al.. 1996) 
and selected using the conserved exon-intron organization of the mouse (Taraviras etal. 1994) as a guide. 
Sequencing was carried using a AmpliTaq FS Dye Terminator Cycle Sequening Kit lABI, Foster City. CA) 
on an ABI Prism TM 377 DNA Sequencer (ABI). 
Screening of HNF4a gene for mutations. 
The eleven exons and flanking introns of the HNF4a gene of an affected subject from families in 
which of MODY cosegregated with markers spanning the MODYl region of chromsome 20 subjects with 
the MODYl -form of NIDDM were amplified using PCR and specific primers (Table 9). PCR conditions 
were denaturation at 94°C for 5 mm following by 35 cycles of denaturatlon at 94°C for 30 sec. 
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appropriate restriction endonuclease and gel electrophorBS.s. 
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Table 10 identifies the DNA polymorphisms and mutations identified in tfie coding region of the 
HNF4a gene. Of course, these are exemplary polymorphisms and those of skill in the art will easily be 
able to employ the methods and descriptions set forth in the present invention to identify other 
polymorphisms. FIG. 7 shows an alignment of the HNF4a protein sequence from humans with sequences 

5 from human mouse, X. Laves and Drosophila. The putative DNA binding sites are underlined and the 
putative ligand binding sites are in bold. The DNA sequences for exon 1, exon lb, exon 2, exon 3, exon 4, 
axon 5 exon 6 exon 7 exon 8 exon 9 and exon 10 of HNF4a are shown in FIG. 8A, FIG. 8B, FIG. 8C, FIG. 
8D FIG. 8E, FIG. 8F, FIG. 8G, FIG. 81, FIG. 8H, FIG. 81 and SEQ ID N0:34, SEQ ID N0:36, SEQ ID N0:38, 
SEQ ID N0:40, SEQ ID N0:42, SEQ ID N0:44, SEQ ID 1^0:46, SEQ ID N0:48. SEQ ID N0:50, SEQ ID 

10 N0:52, and SEQ ID N0:54, respectively. It is contemplated that mutations in any of these exons, or the 
related intron regions therebetween, of HNF4a will result in MQDY1 type diabetes. 

Table 10. 

Polymorphisms and Mutations in the Human HNF4a Gene 



15 



Location 
Exon 


Codon 


Nucleotide change 


Frequency 


4 


130 


ACT (Thr) ATT(lle) 


C;T- 105:5 C-0.95, T 0.05 


7 


273 


GATIAspl-GACIAsp) 


T:C- 169:1 T 0.004, C 0.006 


7 


268 


GAGIGInlTAGIstop) 


0/216 control chromosomes 



The R-W pedigree, which includes more than 360 members spanning 6 generations and 74 
members with diabetes including those with MOOY, has been studied prospectively since 1958 (Fajans. 
1989). The members of this family are descendants of a man who was bom in East Prussia in 1809 and 
emigrated to Detroit, Michigan in 1861 with his four sons, three of whom were diabetic, and five 
20 daughters, one of whom was diabetic (Fajans, 1989; Fajans et a/.,1994). Linkage studies have shown 
that the gene responsible for MOOY in this family, M0DY1. is tightly linked to markers in chromosome 
band 20q12-ql3.1 with a multipoint lod score > 14 in those branches of the family in which MODY is 
segregating (Bell, et at. 1991; Bowden, et 5/.,1992; Irwin, et al.. 1994). The analysis of key 
recombinants in the R W pedigree localized MODYl to a 13-cM interval ( ~ 7 Mb) between D20S169 and 
25 D20S176, an interval which also includes the gene encoding HNF-4 (Stoffel, M. et ai, 1996). The 
demonstration in the previous examples that mutations in the HNF la gene are the cause of the MQDY3 
form of NIDDM prompted the inventors to screen the HNF-4a gene for mutations in the R W pedigree. 
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The h™.n HNF.4a Bene coas,sts o, , , „„h i„„„„, ^^^^ 

positions as in ,he mouse gene HavB.iras. ./., ,994). Al,e,na„ve splicng generates e family of HNF- 
4a mRNAs, HNF.4 ,, 2 an^ 4, the lat.e, two of wh.ch contain ,nse,ts „f 30 and 90 nocleot.des 
respecvely (Tavavitas «./..,994; laine ,994; Otewes, ,9961. Of these, H,^F4 2 .nm appears 
to be the most abundant transcnpt in many tissues. In contrast to a previous report (Dre«,es « ./ 
■ 996), the inventors studies show tha, HNF4a mRNA encodes a truncated and presumably 
nonfunctional form „f HNF.4.. The sequence of exon ,B, the exon encoding the insertion in Hf^F fe 
mRNA revealed an additional T between nucleotides 2,9 and 220 in both alleles of fi.e unrelated 
.nd,v,duals 1,0 chromosomes) no, present in the cDNA sequence IDrewes e, sL ,9961 which causes a 
ftameshif t and the generation of a protem of 98 amino acids whose function, if any, is unknown The , , 
exons Of the HNF.4a gene of ,w„ affected, V-20 and 22, and one unaffected, V|.9, subiect from the R W 
ped„ree were amplrfied and the PGR products seguanced directly. The se,uences were identrcal to one 
another and to the cONA (Drewes e, ./., ,996, Laine e, ./., ,994), except for a C^T substitutions ,n 
exon 4, c.don ,30 and exon 7, codon 268. The C^T substitution in cod.n ,30 results in a Th, 
(ACT)-»lle (ATT) substitution and is a polymorphism (T;iI30) with a frequency of the lie allele in a group 
of 55 unrelated nondiabetic nun-Hispan.c white subjects of 6%. The C^T substitution in codon 268 
tesults in a nonsense mutation CAG (Gln)^TAG (AM, (0268X). The nonsense mu.a.ron was confirmed 
by clomng and sequencing PCR products derived from both alleles. The 0268X mutation created a site for 
the enzyme Bf, I with digestion of the normal allele generating fragments of 28, and 34 bp and the 
mutant allele, ,52, ,29 and 34 bp and facilitating testing for this mutation in other members of the R-W 
pedrgree. ,„ the R-W pedigree, lle,30 and the amber mutation at codon 268 were present in the same 
allele. 

The Q268X mutation cosegregated with the at-risk haplotype and NIDDM in the R-W pedigree 
and was not observed on screening 108 healthy nondiabetic non-Hispanic white subjects (216 normal 
chromosomes). Seven subjects in the R-W pedigree who have inhented the mutant allele (V-IS, 37 and 
48; and VLB, 11,15 and 20) have normal glucose tolerance. The ages of five of these subjects (7-48 
and Vl-B, n, 15 and 20) are less than 25 years and thus, they are still within the age range when 
diabetes usually develops in at-risk individuals in this family. Of the others, subject V-18 is 44 years of 
age and has shown normal glucose on all oral glucose tolerance tests, and subject V.37 who is 36 years 
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of age had one glucose tolerance test characteristic of impaired glucose tolerance and one of diabetes at 
ages 16 17 years but for the past 19 years each glucose tolerance test has been normal even though she 
has a low insulin response to orally administered glucose. She is very lean and active, and has increased 
sensitivity to insulin during the frequently sampled intravenous glucose tolerance test. During a prolonged 
5 low dose glucose infusion, she became markedly hyperglycemic (Herman, etal. 1994; Byrne, et al. 1995). 
Two subjects (V I and 4) who have the mutation were considered nondiabetic based on medical history 
and their affection status needs to be evaluated by oral glucose tolerance testing. The results indicate 
that the nonsense mutation in the HNF-4 gene in the R W pedigree is highly but not completely penetrant 
although the age of diabetes onset Is variable. 
10 in addition to subjects who inherited the Q268X mutation but are presently nondiabetic, there are 

subjects in the R-W pedigree who have NIDDM but did not inherit the Q268X mutation or at-risk 
haplotype. Subject IV-9 was diagnosed with NIDDM at 48 years of age and was hyperinsulinemic, a 
diagnosis consistent with late-onset NIDDM rather than MODY. The inventors also tested her six 
children, one of whom had NIDDM and another impaired glucose tolerance, and all had two normal alleles. 
15 Similarly, 10 children of subject 111 7, five of whom had NIDDM were also tested, and none had inherited 
the Q268X mutation, suggesting that the NIDDM in this branch of the R-W family Is of a different 
etiology. Finally, the five nondiabetic children of 111-11 were also tested and all were normal. The 
presence of both MODY and late-onset NIDDM in the R-W family has been noted previously (Bell, et al. 
1991; Bowden. et al. 1992). The MODY phenotype results from a mutation in the HNF-4 gene. The 
20 cause(s) of the late-onset NIDDM is unknown. 

HNF-4 is a member of the steroid/thyroid hormone receptor superfamily and is expressed at 
highest levels in liver, kidney and intestine (Xanthopoulos et al.. 1991; Sladek et al.. 1990). It is also 
expressed in pancreatic islets and insulinoma cells (Miquerol, et al 1994). In liver. HNF-4a is a key 
regulator of hepatic gene expression and is a major activator of HNF-1a which in turn activates 
25 expression of a large number of liver-specific genes including those involved in glucose, cholesterol and 
fatty acid metabolism (Sladek et al.. 1990; Kuo et al., 1992). Its expression in kidney, intestine and 
pancreatic islets Implies that it plays a central role in tissue-specific regulation of gene expression in 
these tissues as well, although its specific function in nonhepatic tissues has not been addressed. 
Homozygous loss of functional HNF-4a protein causes embryonic lethality characterized by defects in 
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gastrulation underscormg the key role played by this transcnpt.on factor ,n development and 
different.at.on (Chen ./., 1994). The phenotype of the heterozygous animals was not described and 
further studies are necessary to detenn.ne if they represent a mouse model of MODY. 

HNF.4a defines a subclass of nuclear receptors which reside primarily in the nucleus and bind to 
the.r recognition site and regulate transcription as homodimers (Sladek et al.. 1994; Kuo et al., 1992). 
The key role played by HNF.4a in the regulation of hepatic gene expression is well established (Sladek et 
al.. 1994; Kuo et al., 1992). However, its role as well as that of HNF-la, the M0DY3 product and a 
downstream target of HNF.4a action. In regulating gene expression in the insulin-secreting pancreatic p- 
cell is largely unknown, although Emens et ./.(1992) have shown that HNF-la is a weak transactivator 
of the insulin gene. Thus, the mechanism by which mutations in HNF.4a result in an autosomal dominant 
form of NIDDM characterized by pancreatic -cell dysfunction is unclear. The nonsense mutation in HNF- 
4a found m the R-W family is predicted to result in the synthesis of a protein of 267 amino acids with an 
intact DMA binding domam. However, it is missing the regions involved in dimerizat.on and transcriptional 
activation in other members of the steroid/thyroid hormone superfamily Zhang, et al.. 1994; Bourguet, et 
BL. 1995; Renaud, et al. 1995; Wagner, R.L. et al. 19951 and as a consequence is predicted to be unable 
to dimerize, bind to its recognition site and activate transcription. Thus, the dominant inheritance is due 
to a reduction in the amount of HNF-4a per se rather than a dominant negative mechanism. The 
decreased levels of functional HNF-4a appear to have a critical effect on p-cell function perhaps as a 
consequence of decreased HNF la gene expression, mutations in this gene also leading to MODY as 
described in the examples above. Prediabetic subjects with mutations in either the HNF-4a or HNF-la 
genes exhibit similar abnormalities in glucose-stimulated insulin secretion with nomial insulin secretion 
rates at lower glucose concentrations but lower than normal rates as the glucose concentration increases 
(Byrne et al.. 1995). a result consistent with HNF.4a and HNF-la affecting a common pathway in the 
pancreatic p-cell. The absence of overt hepatic, renal or gastrointestinal dysfunction in affected 
members of the R-W pedigree suggests that the levels of HNF-4a in these tissues, although possibly 
lower than normal, are sufficient to ensure normal function or that alternative pathways are sufficient for 
expression of key genes. However, detailed studies of hepatic glucose production and metabolism have 
not performed in subjects from the R-W pedigree and it is possible that subtle alterations in these 
processes may be present. 
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The demonst„.ion thai MOOY can result from mutations in the HNF-la and HNF.4„ genes 
suggests that this lortn of NIDDM is p-imarilv a d,sorde. of abnormal gene expression. In this regard, 
genes encodmg other proteins in the HNF-1alHNF.4a regulatorv cascade such as other members of the 
HNF-1 (Mendel e, si.. 19941 and HNF.4 families IDtewes etsL 19961 as well as HNF S (Lai « sL 19931. 
5 HNF.6 ILemaigre, er./. 19961.1, and perhaps dimetizallon cofactor of HNF-1 (Mendel 19911 should 
be considered as candidates ler other forms of MODY andlor late onset NiODM. The role of HNF.4a ,n 
the development of the more common late onset NIDDM is unknown. Thete is no e„dence for linkage of 
markers flankurg the HNF.4<x gene with lale-onset NIDDM in Mexican Americans or Japanese impl,mg 
that mutations in the HNF.4a gene are unlikelv to a significant genetic facto, contributing to the 
,0 development of late-onset NIDDM. However, acquired defects in HNF.4a expression may contribute, at 
least in part, to the p-cell dystunction which characterizes late-onse. NIDDM IPolonsky e, bL 19961 
especially if it plays a central role in regulating gene expression in the pancreatic (1 cell as suggested by 
its association with MGDY. Furthermore, the similarity between HNF.4a and Hgand dependent 
transcription factors raises the possibility that HNF-4a =nd the genes it regulates respond to an 
, 5 unidentified ligand. The identification of such a ligand hy the methods of the present invention will lead 
to new approaches for treating diabetes. 

EXAMPLE 4 

Organizationand Partial Sequence of the HNF 4a|M0DY1 Gene and Identification of 
Missense Mutation, R1 27W. in a Japanese Family with MODY 

20 HNF.4a is a member of the nuclear receptor superf amily. a class of ligand-activated transcription 

factors A nonsense mutation in the gene encoding this transcription factor has been recently found in a 
white family with one form d maturity-onset diabetes of the young. MODYl . In the present example, the 
inventors report the exon-intron organization and partial sequence of the human HNF-4a gene. In 
addition the inventors have screened the tv^elve exons, flanking introns and minimal promoter region for 
25 mutations in a group of 57 unrelated Japanese subjects with early-onset NIDDMIMODY of unknown 
cause. Eight nucleotide substitutions were noted, of which one resulted in the mutation of a conserved 
arginine residue. Arg127 ICGG^Trp (TGG) (designated R127W). located in the T box. a region of the 
protein that may play a role in HNF-4a dimerization and DNA binding. This mutation was not found in 
214 unrelated nondiabetic subjects (53 Japanese, 53 Chinese, 51 white and 57 African-American). The 
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Rl 27W mutation was only present m three of five diabetic members in this family indicating tnat it is not 
the only cause of diabetes in this family. The remaimng seven nucleotide substitutions were located in 
the proximal promoter region and introns. They are not predicted to affect the transcription of the gene 
or mRNA processing and represent polymorphisms and rare variants. The results suggest that mutations 
in the HNF-4a gene may cause early-onset NIDDM/MODY m Japanese but they are less common than 
mutations in the HNF-la /M0DY3 gene. The information on the sequence of the HNF.4a gene and its 
promoter region will facilitate the search for mutations in other populations and studies of the role of this 
gene in determining normal pancreatic p cell function. 

1. Methods 

Isolation and partial sequence of the human HNF-4oi gene 

Three Pl derived artificial chromosome (PAC) clones, 1 14E13, 130B8 and 207N8, containing the 
human HNF-4a gene were isolated by screening PAC DNA pools (Genome System, St. Louis, MO) by 
PCR- with HNF.4a specific primers (Yamagata etal., 1996a). The partial sequence of the HNF.4a gene 
was determined using DNA from PAC's n4E13 and 207rJ8 and sequence specific primers with an 
AmpliTaq FS Dye Terminator Cycle Sequencmg Kit and ABI Prism^^ 377 ^^f^ sequencer (ABI, Foster 
City, CA). The promoter sequence was examined for transcription factor binding sites using Matlnspector 
(Quandt et al., 1995) and TFSEARCH (Version 1.3 http//www.genome.ad.gp/kit/tfsearch.html). The 
sequences of alternatively spliced mRWAs were confirmed by sequencing PCR™ products generated by 
amplification of human liver cDNA using specific primers. 
20 Screening of the HNF 4a gene for mutations 

The 12 exons, flanking introns and minimal promoter region were screened for mutations by 
amplifying and directly sequencing both strands of the PCR'- product using specific primers (the 
sequences of the primers are available at www.diabetes.org/diabetes). The sequence of the missense 
mutation (R127W) was confirmed by cloning the PCR- product into pGEM-T (Promega. Madison, Wl) and 
sequencing clones representing both alleles. The R127W mutation leads to loss of a Msp I site and 
subjects were tested for the presence of this mutation by digestion of the PCR'" product of exon 4 with 
Msp I, separation of the fragments by electrophoresis on a 3% NuSieve* 3:1 agarose gel (FMC 
BioProducts, Rockland, ME) and visualization by ethidium bromide staining. The sequences of the DNA 
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u A Knth «tranri<; of the PCR'" producl and were not confirmed 

polymorphisms are based on sequencing both straras ot im ron v 

directly by cloning and sequencing the PGR™ product. 

Subjects 

The study population consisted of 57 unrelated Japanese subjects attending the Diabetes Clinic, 
Tokyo Women's Medical College who were diagnosed with NIDDM before 25 years of age and/or who 
were members of families in which NIDDM was present in three or more generations: age at diagnosis, 
20 U7 5 years (meantSE); male/female, 31126; and treatment, insulin • 36, oral hypoglycemic agents • 
10. and diet • 1 1. Thirty-two of the subjects met strict criteria for a diagnosis of MGDY (/.a. NIDDM .n 
at least three generations with autosomal dominant transmission and diagnosis before 25 years of age m 
at least one affected subject). NIDDM was diagnosed using the criteria of the World Health Organisation 
(Bennett etal., 1994). At the time of recru.tment. informed consent was obtained from each subject and 
a blood sample was taken for DNA isolation. Fiftythree unrelated nondiabetic Japanese subjects were 
tested for each nucleotide substitution and mutation to determine if the sequence change was a 
polymorphism or disease-associated mutation. In addition, 53 Chinese (15). 51 white (16), and 57 
African-American unrelated nondiabetic subjects (16) were tested for the R127W mutation 

2. Results 

Organization and partial sequence of human HNF-4a gene. The human HNF-4a. gene (gene 
symbol. JCn4i consists of 12 exons spanning approximately 30 kb, of which about 10 kb were 
sequenced including 1 kb of the promoter region (the gene sequence is available at 
www.diabetes.orB/diabetes). Human HNF-4a mRNA is alternatively spliced (Hata et ai. 1992; Chart.er 
et al., 1994; Drewes et a/., 1996; Kritis et al., 1996) which may generate as many as six different forms 
of HNF.4a (FIG. 12). HNF-4a2 is the predominant form present in many adult tissues including liver, 
kidney and intestine. The inventors have used RT-PCR™ to determine which HNF-Aa transcripts are 
expressed in human pancreatic islets. This analysis showed that islets express mRNAs for HNF-4al. 2 
and 3. The inventors could not detect islet transcripts that included exons IC and IB although 
transcripts containing these two exons could be detected in human liver by RT-PCR'-. 

The sequence of 1 kb of the promoter region of the human HNF-4a gene was determined (FIG. 
13). The comparison of the sequences of the human and mouse genes showed regions of sequence 
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conservation that included the predicted start of transcription and the bmd.ng sites for several 
transcription factors including HNF B, AP-I, HNF-S. HNF-lu and NF-l. The transcription start site for 
the human gene has not been determined directly but has been inferred from studies of the mouse gene 
vvhich showed multiple start sites spread over a 10 bp interval (Zhong et eL 1994; Tavaviras et al. 
5 1994) of which one was defined as nucleotide . 1 (Zhong et ai. 1994). The sequence homology in the 
promoter of the human and mouse genes suggests that transcription of the HNF.4a gene may be 
regulated in a similar manner. In this regard, Zhong et al. (Zhong et al.. 1 994) have shown that the ma.or 
promoter activity in a hepatoma cell line was associated with a 126 bp fragment of the mouse promoter 
(nucleotides 289-414 in FIG. 13). There is 83% identity between the human and mouse sequences in this 
10 minimal promoter region. 

Mutations and polymorphisms in the HNF.4a gene. The twelve exons, flanking introns and 
minimal promoter region were screened for mutations in 57 unrelated Japanese subjects with earlyonset 
NIDDM/MODY. This analysis revealed one putative mutation (FIG. 14) and seven DNA 
polymorphisms/variants (Table 11). The putative mutation m exon 4 at codon 127. CGG (Ar8)-.TGG 
1 5 (Trp) (R 1 27W) alters a conserved ammo acd that is located in the T-box, a region implicated in receptor 
dimerization and DNA binding (Lee et al.. 1993; Rastmejad et ai. 1995; Gronemeyer and Moras 1995 
Jiang and Sladek et ai, 1997). The C->T substitution in codon 127 results in the loss of a site for the 
enzyme Af./7 1 and digestion of the normal allele generates fragments of 104, 91, and 76 bp, whereas the 
mutant allele generates fragments of 104 and 167 bp. PCR^-RFLP analysis showed that the R127W 
:0 mutation was not present in any of 214 unrelated nondiabet.c subjects of different ethnic groups (53 
Japanese, 53 Chinese, 51 white and 57 African-American). 
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TABLE 1 



DMA PolymorphismsfVariants 


in the Human HNF-4a Gene in Japanese Subjects 


1 nrfltinn 


Nucleotide 


Substitution 


Earlyonset Nondiabetic 








NIDUlVllIviUUT 




Promoter 


nt 922 


G->A 


G 0.99, A 0.01 


G-1.00, A 0.00 


Intron 1A 


nt 1364 1 + 109) 


T^C 


10.99, A O.Oi 


T 1 nn r n nn 




nt 1486 (-21) 


G-)-A 


G-0.99, A-G.Ol 


G 0.99, A-0.01 


Intron 1 C 


nt 2218(105) 


G->A 


G-0.99, A 0.01 


G l.OO, A-0.00 


Intron IB 


nt 2420 ( + 8) 


A-)-G 


G D.99, A-0.01 


G-0.99, A-0.01 




nt 3142 (-38) 


T->C 


T 0.28, C 0.72 


T 0.24, C-0.76 




nt3175(-5) 




C-0.84, T 0.16 


C 0.86, T-0.14 



10 



15 



The R127W mutation was present in thtee of five diabetic members of tKe J2.21 family, a MODY 
family characterized by severe microvascular complications (Iwasaki e, ./., 19881 (FIG. 151. In addition, 
subject 112 must be a carrier since she has children with both normal homozyjous and heterozygous 
genotypes. The a,e a. diagnosis of diabetes in two of the four subieCs with the R127W mutation was 
<25 years (subiect «-t 16 years; and subject 1114, 17 years!. One of the sub,ects with ,be R127W 
mutation was diagnosed with diabetes at 90 years of age indicating the variable penettance of .he 
mutant allele. Another subject, the 12 year-old son of subject lll-A, has «he,i.ed the mu.an, allele but ts 
nondiabetic. However, he is no. ye. beyond the age a. risk and may develop diabe.es in .he lo.ure. There 
„e two subjects with diabetes in the J2.21 family who did no. inherit the at-risk allele (sub,ects 1113 and 
.61. Such etiological heterogeneity has been noted previously (Bell era/.. 19911. 

The seven DNA polymorphismslvariants were located in the promote, region and the int.ons 
ITable 11 FIG. 131. In subject J2.96 (FIG. 15), there was a G^A subs.i.utioo at nucleotide 922 in the 
proximal promoter region which changes the human sequence so .hat ,t more closely resembles .he 
sequence of .he mouse gene (FIG. 131. This substitution was not found on screening 53 nondiabetic 
subjects Since this substitution does no. alter a conseryed residue or disrupt the binding site for one of 
the factors predicted to regulate transcription of the HNF.4a gene, the rnventors believe lha. ,t ,s a rare 
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.ar,e„, r.h. ,han a d...es.as.c,a,ed ™,a„„„^ Howe.e,. ..... s,„.« a. necassa^v *s„„g„,3. 

between these two possibilities. 

The Six subs,i,u,io„s .o.„d ,„ ,„„.„s (Table „, do no. dis,„p, „e consar.ed GT and AG 
»c,e=.i.es „, .p„ca .o„„, an. accep.o, si.a,. respeCvel. and a. ... nni.elv .o a„ac, spl.cn, 
■ T.e .b.„.„„„„s a. „„c,ea.ides ,486, 2420, 3,42 and 3,75 ware ,o„„d in bo.b d,abe.,c and 
n.nd,abe.,e Japanese s„b,ec.s indica.in, ..a. .bey are p„,v.„rp,s.s ra.be, ,han d.abe.es-assoca.ed 
™.a.,ons. The s„bs„.„.ions a. nuCeo.ides ,364 and 22,8 „ere .„„nd onlv in .w. di.feren. u„,e,a.ed 
s.b.ec,s „„h ea,„.o„se, NIODM^OOV, The in.en.nrs believe ,ha, .hese a,e rare variants ra.he, .han 
.abe.es.ass.cia.ed ™.a.i„„s as .hey are no. near .he sp,ice donor and accep.or si.es bo, are ra.he, ,n 
the central portion of the intron. 

EXAMPLES 

Hepatic Function in a Family with a Nonsense Mutation (R154X) in 

HNF 4a/M0DYl Gene 

MODY Is a genericallv heter„geneo.s monogenic drsorder charae.eri.ed bv ao.osonra, don„nan. 
■nhenrance, onse. usoall, before 25 years o, a,e end abnor™. pancraa.ic p-cel, ,„no.,„„. Mo.a.rons ,n 
.hs hepa.ocy,e noCear fac.o, (HNf,.4a/MDDy,, g,„coklnase/M0DY2 and HNF-,a,M0Dy3 penes can 
cause ,h,s form o, diabe.es. ,n contras, .o .be glocoklnase and HNMa genes, mu.ahcns in .he HNF.4a 
Bahe are a re,„l.ely oncommon caose of MODY and .he inven.ors' unders.anding of ,he MOOY, f.rm of 
d.abe,es is based on s.udies ., only a single family, .be R-W pedigree. Here ,he rnyen.ors repor, ,he 
.dsn.,f,ca.lon of another farrvly „i,h MODY, and the first In yyh.ch .here has been a detailed 
cha,ac,en,a„on of hepatic function. The affected members of this family, Oresden-,, ba.e inherited a 
nonsense motation, R,54X In the HNf.4a gene and are predicted ,o ba.e reduced levels of this 
.'ahscr,pt,on factor i„ the tissues in „h,ch ,, is expressed including pancreatic islets, l„er kidney and 
.nt.st,ne. Sobjecs with the R ,54X mutahon exhibited a diminished insu,,n secretory response to oral 
g ucose. HNF.4a plays a central role in trssue-specific regulation o, gene expression In .be llver including 
.ha control o, syn.hes.s of proteins .nvolved in cholesterol and lipoprotein metabolism and ,be coagolatron 
cascade. However, subjects with the R154X mutation showed no abnormalities in lipid metabol,sm a, 
ooagulatron except for a paradox.ca, 3.3.fo,d increase In serum ,ip„p,o,e,n,a, levels. Nor was there any 
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evidence of renal dysfunction in tf^ese subjects. The results suggest that MODYl is prirr^arily a d.sorder 
of p-cell function. 

1. Methods 

Subjects. 

The study population consisted of members of twelve unrelated families with early-onset NIDDM 
ascertained through the Department of Internal f^edicine III. University Clinic Carl Gustav Carus of the 
Technical University, Dresden, Germany. Families were selected based on the presence of non-insulin- 
dependent (type 2) diabetes mellitus (NIDDM) in two or more generations with diagnosis before 35 years 
of age in at least one subject. Sufficient family data were available to suggest a diagnosis of MODY in 
nine of these families [i.e.. NIDDM in three generations with autosomal dominant inheritance and onset 
before 25 years of age in at least one affected subject) (Fajans et aL 1994). The remaining three 
families were classified as having early-onset NIDDM. The average age at diagnosis of diabetes in 
affected members of these twelve families was 29.9±2.8 years (range, 14-60 years) (mean±SEM) and 
included 18 men and 13 women of whom 12, 12 and 7 were being treated with insulin, oral hypoglycemic 
agents and diet, respectively. At the time of recruitment, informed consent was obtained from each 
subiect and blood and urine samples were obtained for DNA isolation and clinical testing. 
Screening HNF-4a gene for mutations. 

The minimal promoter region Inucleotibes -21 to -459) (Zhong et aL, 1994) and 10 exons 
encoding the HNF-4a fomi (Drewes et al.. 199B) of HNF-4a were screened for mutations by polymerase 
chain reaction (PCR~) amplification and direct sequencing of both strands of the amplified PCR~ product 
as described previously (Yamagata etal., 1996). Sequence changes were confirmed by cloning the PCR™ 
product into pGEM-4Z (Promega. Madison. Wl) and sequencing clones derived from both alleles. The 
sequences of the primers for the amplification and sequencing of the minimal promoter region are P 
l,5'.CAAGGATCCAGAAGAnGGC-3' (SEQ ID N0:120), and P2, S' CGTCCTCTGGGAAGATCTGCS' (SEQ 
ID NQ:121); the size of the PGR"- product is 479 bp. The sequence of the promoter of the human 
Hl\IF-4a gene has been deposited in the GenBank database with accession number U72959. 
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Linkage analysis. 

Family members were typed with the markers D20S43, D20S89. D20S96, D20S119, D20S169 
and D20S424. all of wh.ch are tightly linked to the HNF-4a gene (Stoffel 1996). Tests for linkage 
were earned out us.ng the haplotype formed from these markers and assuming a recombination frequency 
between adjacent markers of 0.001 w.th the computer program ILINK (Lathrop st al.. 1984; Lathrop and 
Lalouel. 1984). The frequencies of the haplotypes were estimated from the data. The analysis assumed 
a disease allele frequency of 0.001 and two liability classes. Liability class 1 mcluded individuals who 
were 25 years of age with penetrances of 0.00, 0.95 and 0.95 for the normal homozygote, heterozygote 
and susceptible homozygote, respectively. Liability class 2 included individuals who were <25 years of 
age with penetrances of 0.00, 0.60 and 0.95 for the normal homozygote, heterozygote and susceptible 
homozygote, respectively. The affection status of the one subject with impaired glucose tolerance was 
coded as affected. The maximum expected lod score (ELOD) was determined using the computer program 
SLINK (Ott, 1989; Weeks etal., 1990). 
Clinical Studies. 

A standard 75 g oral glucose tolerance test was given to subjects after a 12 h overnight fast 
Treatment with insulin and oral hypoglycemic agents was discontinued 12 h and 24 h, respectively, 
before testing. Blood samples for glucose, insulin, C-peptide and proinsulin were drawn at 0, 30, 60, 90 
and 120 min. fasting blood samples were also drawn for the measurement of insulin, islet cell and 
glutamic acid decarboxylase (GAD) antibodies, glycosylated hemoglobin (HbAJ, lipoprotein(a) 
apolipoproteins Al, All. B, Cll, Clll and E. cholesterol (total and in VLDL. LDL, HDL, HDL2 and HDL3) 
triglycerides (total and in VLDL and LDL.HDL), coagulation time (QUICK test) and partial thromboplastin 
time (PTT), fibrinogen, von Wiliebrand factor antigen (vWFr:Ag), plasminogen activator inhibitor-! (PALI), 
tissue-type plasminogen activator (tPA). alanine aminotransferase, y-glutamyl transferase, bilirubin, 
albumin, total protein, hemoglobin, creatinine, urea, amylase, lipase and uric acid. A urine sample (from a 
24-hour collection of urine) was taken for measurements of creatimne and microalbumin. 
Assays, 

Blood glucose was measured with a hexokinase method (Boehnnger-Mannheim. Mannheim, 
Germany), plasma insulin and C-peptide by radioimmunoassay (OPC Biermann GmbH, Bad Nauheim, 
Germany; and C peptide RIA Diagnostic Systems Laboratories, Sinsheim, Germany, respectively), plasma 
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p,oms.l,n bv EUSA (ORG Instruments, Marburg, Ge^anyl, HbA. b, HPIC lOlAMAT Analyzer, Bio-Rad, 
Munich, Germarty), fibrrnngen by the Claass method IFibnnegen A K,t, Boehrrnger-Mannhein,), PALt by 
bioimmunoassav and EUSA ,TC- Aetibind FAM and TC' PAM EUSA, Technoclonellmn^una GmbH 
Deutschlaad, Heidelberg, Germany,, tPA by ELISA ITintElize' tPA, Bi.po.l AB, Umea, Sweden,, .WFr:Ag 
enzymatically (ELISA Asserachrom' .WF. Boehringer-Mannheim,, insulin- and GAD-Ab by EUSA and 
radioimmunoassay lEIras, Freiburg, Germany,, isle. ceMb by an immunofluorescence assay .us,n, a 
positive sample from EUROIMf^UN —jie GnrbH, GroP Grbnau, Germany,, coagulation and partial 
tbromboplastin time by ,he AMAX Analyze, Munich, Germany,. Total cholesterol, cholestemi .n VLDL, 
HDl LDL.HOL, and HDL3 were measured by the CHOD-PAP, total triglycerides and triglycerides m VLDL 
3 and LDL^HDL by the GPO-PAP method using the Cibe Coming 550 Express Clinical Chemistry Analyzer 
,Boehringe,.Mannhe,mi. HDL2.chnlestero, was calc*ted using the fonnula HDL2.H0L.HDL3. S»nples 
for the measurement of cbolestsrel, triglycerides in VLDL, HDL, LDL.HOL were prepared by preparative 
ul,racemri.uga.,.n us.ng a Beckman Opt^a tabletop TLX ultracentrHuge with a TLA-,2a.2 rot.r. Serum 
creatinine, urea, unc acid, tote, protein, alanine aminotransferase, ^glutamyl transferase. b,l„ub,n, 
5 amylase and urine creatinrne were measured using the BM Hitachi 717 Chemistry Analyzer (Boehrmger 
Mannheim,, Lipase was measured using the Monarch System (Sigma Germany. Munich, Germany,. 
Apolipoproteins Al. All and B and urine microalbumin were measured using the Behnng-Nephelometer BN 
,1 (Behringwerke. Marburg, German,,. Apolipoproteins Clll and E were measured using the Sebia System 
(Fulda, Germany,, apolipoprotein Cll using the RID System (V«AK, Bad Homburg, Germarry,. 

20 2. Results 

UentificBtitin of a nonsenst imitation in the HKMa gem. 

Twelve families with eady-onset NIDDMIMODY were ascertained for genetic studies of MODV in 
subiects of German ancestry. Mutations in the H»F.la,M0DY3 gene (Yamaga.a « ./., .996, were found 
in three of these families (Kaisaki e, eL. 1997, Tt« HNF.4a gene was screened for mutations ,n one 
25 affected subject from the remaining nine families. There was a C^T substitution in codon 154 of axon 4 
in the proband (11-4, of family Dresden-I, (FIG. 16, which generated a nonsense mutation CGA (A.g>-» 
TGA (OP, (R154X, FIG. 171. The R164X mutation would result in the synthesis of a truncated protein of 
,53 amino acids with an intact ONA binding domain bu, lacking the li,and binding and transactr.atron 
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-.ma,n (Sladek et 1990,^ In ,dd„.n ,K. ™,a„„„, „e,e w=. a s,len, C^T s.bs,i,„,ion ,n ,he 
cdon fo, Alase IGCC/GCTI ,„ one s.bjec, wteh did no, co«g,ega,a wi,h MODy,ea,„.o„.a, NIDOM 

The presence of ,he R>54X mu.a„on ,n other members of the Dresden ! , fa.i,, „as de.ermrned 
b» PGR- amplilicabon and direct sequencing of exon 4^ The RI54X mu,a„on cosegregated w„h MOOV 
■n .he Oresder,.!, family ,FiG. 161. All diabetic s„b|ec,s had the R,54X station as d,d a M-yea, old 
male (11121 with rmpaired glucose tolerance. The at-risk hapl.type showed some evidence fo, linkage 
With MGDV with a led score of ,.20 at a recombination of 0.00 (the maximum expected lod score ,n th,s 
pedigree is 1.20). 

Age at diagnosis. 

Three subjects were diagnosed with NIODM between ,5-25 years of age and two others at 28 
ahd 44 years ,F,G. ,61. The subject, d.agnosed with diabetes a, 44 years of age had proliferative 
re.,„opa,hy at the time of diagnosis suggesting that the onset of diabetes had been many years earlier. 

Clinical severity of diabetes. 

The diabetes in the Dresden-, , family was severe and all the diabetic sub/ects were treated with 
either rnsulm or oral hypoglycemic agents. Subjects with diabetes of long duration le;,. M „.4| bad 
drabetic complications mcloding proliferative tetinopathy, macrcvascttler disease (coronary heart diseasel 
and peripheral polyneuropathy. Surprisingly, none of the sub,ec,s with the R,54X mutation had evidence 
Of nephropathy. Thus, the diabetic phenotype of the Dresden-,, family is very similar to ,ha, seen in the 
R-W pedigree (Fajanse,./., ,9941. IVone o„he subjects in the Dresden-,, family were positrvc for islet, 
insulin or GAD antibodies. 

Insulin secretory response. 

Ptevious studies have shown that prediabetic subjects with a mutation in HNF-4a exhibit a 
characteristic defect in the normal pattern of glucose-stimulated insulin secretion as well as abnormalities 
ih other measures of normal p-cell function (Herman.,./, ,994; Byrne e,,/.. ,9951. The OGTT studies 
showed a profound reduction in insulin secretion accompanied by diminished C-peptide and proinsulin 
levels in subjects with the R154X mutation (FIG. 18). 
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Lipid levels. 

None of the subjects with the R154X mutation showed evidence of secondary 
hypertriglycendemia, even though several (1-1. li-4, 1111) had poor metabohc control with HbA, levels of 
1 0.6, 8.8 and 10.1, respectively (Table 12). 

TABLE 12 

Clinical Parameters of the Dresden ! 1 family 



Genotype 



Parameter 



Age at diagnosis (years) 
Current age (years) 
n (females/males) 
BMl (kg/m^) 
HbAfc (%) 
Basal insulin (nM) 
Basal C peptide (nM) 
Cholesterol (mWI), total 

in VLDL (mM) 

in LDL (mM) 

in HDL (mM) 

in HDL2 (mM) 

in HDL3 (mM) 

Triglycerides (mM), total 

in VLDL (mM) 

inLOL + HDHmM^ 

Lipoprotein (a) (mgll) 

ApoB (gll) 

ApoAl (gll) 

ApoAII (gll) 

ApoE (mgll) 

ApoCII (mgll) 

ApoCIII (mgll) ^ 



NormaliMutant 


NormaljNormal 


Reference 




(femalefmale) 


values 


26 40 + 3 47 






35.50 ±7.58 


62141 




214 


111 


-- 

< 25.00 


25.21 ± 1.15 


41.08122.86 


8 13 + 0.78 


5.6015.30 


<6.50 


0.067 ± 0.005 


0.08010.040 


0.059-0.253 


0.60 ±0.08 


0.68/0.45 


<1.06 


4.72 ±0.41 


5.03/5.01 


<5.20 


0.79 ±0.31 


0.2110.70 


0.101.40 


2.86 ±0.25 


3.6213.34 


1.80-5.10 


1.17±0.18 


1.32/1.26 


0.80-2.50 


0.31 ±0.06 


0.44/0.27 


0.10-0.60 


0.86 ±0.12 


0.88/0.99 


0.80 1.90 


0.70 ±0.13 


0.65/1.45 


0.40-2.80 


0.43 ±0.13 


0.34/1.06 


0.10-2.10 


0.28 ± 0.02 


0.33/0.47 


0.200.80 


81 6.0 ±90.4 


3.0/6.0 


< 250.0 


1.38 ±0.22 


1.33/1.38 


0.72-1.50 


1.86 ±0.16 


1.89/2.00 


1.121.75 


0.32 ±0.02 


0.290.53 


0.300.70 


61.2 ± 12.2 


65.0/55.0 


13.076.0 


36.0 ± 5.3 


36.0/61.0 


7.0-63.0 


26.7 ± 3.7 


23.0136.0 


16.0-45.0 
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TABLE 12, continued 



Parameter 



Genotype 
Normal/Mutant 



Creatinine (|iM) 9^ 5 -^55 

Urea(mM) 5.6-0.8 

Total protein (g/l) 111 - \J 

Albumin (g/l) 38 5 ^ , g 

Alanine aminotranferase 0.39 ± 0.06 
(nmol/ls)) 

y-glutamyl transferase 0.54 r 0 12 
(lamol/ds)) 

Bilirubin (jaM), total 16.7 ± 5.2 

Uric acid [\M) 249 - 28 

Exocrine pancreatic function 

Amylase (U/l) 56.8 ±67 

Lipase (^mole/(l's)) 1.22 ±0.40 

Coagulation parameters 

Coagulation time (%) 1 17 ± 6 

Partial thromboplastin time (s) 33 ± 1 

Fibrinogen (g/l) 3.54 ±0.23 

Von Wiliebrand Factor Antigen 103 ±11 
(%) 

PAI-1 (ng/ml), total 35 ± 8 

tPA(ng/ml) 10 6 ± 1.5 
Urine analysis 

Creatinine (mM) 8.36 ± 0.88 

Microalbumin (mg/24 h) <2.2 



Normal/Normal 
(female/male) 



Reference 
values 



73.0/80.0 


< 124.0 


6.6/1.0 


3.68.9 


77.2/84.0 


65 0-85 n 


38.5/43.5 


37.0-53.0 


0.39/0.91 


0. 10-0.67 


0.55/1.11 


n 1 ft n fl*? 

u . 1 OU.O J 


13.7/24.3 


1.016.0 


317/359 


208-416 


30.0/58.0 


17.0115.0 


0.20/3.00 


0.38-3.40 


1Q8/1?R 


7n 1 on 
/U- IZU 


29/35 


30-40 


2.89/3.69 


1.50-4.00 


145/115 


70-200 


102/40 


3080 


17.2/16.0 


2.0-10.0 


7.96/2.86 


4.6618.00 


13.5/<2.2 


2.2 18.0 



10 



J "-t^ ^iiwvwM Willi UiC dlliyiC; 

values. Reference values are those from the Institute of Clinical Laboratory Diagnostics, Umversity Clinic 
Carl Gustav Carus, Dresden. 

Hepatic and renal function. 

HNF.4a is expressed in the liver and kidney and as such mutations in HNF.4a might be expected 
to affect the normal function of these tissues (Sladek et ai. 1990; Cereghini. 1996). In this regard, 
HNF.4a regulates the expression of a number of apolipoprotems .ncluding AI. AlV, B and Clll (Cereghini, 
1996). The serum apol.poprotein levels and lipoprotein fractions were normal in the subjects with the 
R154X mutation except for lipoprDtein(a) levels, which were elevated 3.3-fold (Table 12). L.poproteinO) 
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levels h,.e been ,ep.r.ad ,o be elevated ,,. sub,ec.s with NIDDM ,n so.e sl.die. ,NeK„awa .M/., 1996; 
H„a.a e, ./ , 19951 b., no, others (Durlach e, ./., .996; Chico e, sL ,996), Howevs,, an elevation ,n 
lipop,o,e«„a, levels ,n subieCs »i,b HNf-Aa de.icienc, appears paradoxo, as exp,ess,o„ o, 
,ip.p,o,e,n(B, is controlled b, HNMa IWade e, a/., 1994, which is in torn ,e,olated by HNMa 
5 |C.,e,h,n, 1996,. Thos, lower lipoprotein!,, levels no. hrpber wotdd be expected m subieCs w,tb the 
R,54X .otatlon. Further studies will be necessary to determrne the relationship between lipopro,e,n(a, 

levels and mutations in HNF4a. 

HNF4a also regulates the expression of albu™n. fibrinogen and the coagulation factors VII, Vlll, 
IX and X ICere,h.i, 1996; Erdntann and Heim, 199B; Figueiredo and Br.wnlea. 1996; Neka and 
0 Brownlee, ,996; Hun, and High. 19961. The ser„ levels of albumin and fibrinogen and nteasure.en.s o 
coagulation tinte were norntal in subiects with the R154X mutat.on ITable ,2,. HNF.4a is also expressed 
in the kidnev although the Identity of the targe, genes in this organ e,e unknown fSladek e, a/. .990; 
Cereghini 1996). The urinar, creatinine and nicroalbuntin levels were normal in subiects w,,h the R1B4X 
mutation .Table 12, suggesting that renal function was not rmpaired in sub|ects with mutatrons ,n the 
1 5 HNF-4a gene. 

EXAMPLES 

Diminished.nsu1ina,.dGluc.,o„Secr.,or,l,esp.nsas,oArgin1nein«onaiabe,,cS«bi.« 

with a Mutation in HNF4a|M0DY. Owe 

Nondiabetic subiects with the D268X notation in .he hepa.ocyte nuclear factor (HNF,- 
20 4alM00Y. gene have impaired glucse-induced insulin secretion. To ascertain the effects of the 
nonglucose secretogogue arginine on insulin and glucagon secretion in these subiects. we studred .8 
members of the RW pedigree; 7 nondiabetic mutation negative (NDH,, 7 nondiabefc mutation posrtrv. 
INOI.l, and 4 diabetic mutation positive IDW,. We gave arginine as a 6 g bolus followed by a 25 
mi».,e infusion at basa, glucose concenttations and after glucose infusion to Camp plasma glucose a. 
25 - 200 mgldl The acute insulin response (WW, the 10-60 minute insulin area under the curve (AUC, and 
the ins* secretion rate IISR, were compared as were acute glucagon response lAGR, and glucagon 
AUC The NOW and Dl.l groups had decreased insulrn AUG and ISR and decreased glucose potentratron 
„, AlR insulin AUC, and ISR to arginine administration when compared to the MOM group, A. basal 
,l„cose concentrations, glucagon AUC was greatest for NDIl. intemtediate fo, NDIH and lowest for 
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OM group. 0.,„g ,,pe,,„,e™c clamp .here was decreased s.ppressron g.cagon AUC fpr dch 
ND|,| and Ol^l g,o„ps corrrpared ,o ,he ND1-) gro^p. The decreased ISR ,p ar,l„,„e ,n ,he ND|H Bro.p 
compared ,o ,he NDI-, group, magnrl.ad by glucose po,en,ia„on. ,„d,ca.es ,ha, affec.s the 

srgnalrng pathway fo, arg.n.ne-ioduced msu(i„ secre„o„. The decrease in glucagon AUC and decreased 
suppressron of glucagon AUC „„h hyperglycem. sugges, lhat mu.a.ions ,n HNF4a may lead ,o a-cell 
as well as p-cell secretory defects or to a reduction m pancreatic islet mass. 
1. Methods 

Su6/ects 

Erghteen members of the RW pedigree fron, branches 112 and II 5, generations III, IV and V were 
studied (Fajans, ,990; fa,ar,s e, sL ,994,. The study was reviewed and approved b, the lhs,„u„o„al 
Revrew Board of the Un.ers.ty of Mich,gan Me*cal Center, and a„ suh,ects and,or parents provided 
written rnformed consent. The glyceric status of each sub,ec, was determined by era, glucose tolerance 
(OGTTI as defined by the National Diabetes Data Group (NDOOl 1,9791. Each subiect was orig.nally 
,yp=d with a series of DMA markers on chromosome 20, to determine whether he or she has ,„he„ted the 
extended at-risk haplotypa (defined by alleles at the loci ADA, 020SI 7, D20S79, and D20S4, associated 
with MOOV, IBell ,99,; Bowden « ,992; C. e, a/., ,992; Rothschild er ./., ,993, When 

tfte Q268X mutation in the HNf.4a gene was shown to be the cause of MODy, in the RW pedigree 
(Vamagata ,996a,, subjects were tested directly for th,s mutation. All the subjects mcluded in this 
study, except nondiabetrc individual GM„626. have been tested for the presence of the Q26ex 
mutatton. However, h,s nondiabetic father, IV-,6, was tested and he does not have the Q268X mutation 
Based on the DGTT results and the presence or absence of the Q268X mutation or at-risk haplotype, the 
family members were subdivided into three groups: 

Wondiabetic n7RR» mutatinn.n«ii a|i ,e proiin IMn|.| | 

Seven nondiabetic mutation-negative subjects were studied. GM identification numbers (Human 
Genetic Mutant Cell Repository, as given by Bell « (,991,, RW pedigree generation and person 
numbers as given by Fajans ,,9941, and age a. the time of study were: G(;«,00a5 IV-22 45 fears- 
GM„429, IV-4,, 32 years; GM„626, offspring of ,V-16. ,7 years; GM,0,53, offsprmg of .« ,7 ,8 
years; GM„579, offspring of IV.,9, ,6 years; GM„33,, offsprmg of IV-2,, 2, years; and GM„333 
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offspnng of IV.21. 22 years. Four of tf^ese subjects were offspnng of diabefc parents (GM10085, 
GM11429, Gl\/n0153, and GM11579). 

Nnnriiabetic Q^^fiBX mutation r"^'^'"'^ n^""P 

Tf,is group included seven subjects. Two subjects never had diabetes or impa.red glucose 
5 tolerance on OGTT: GM11090, offspring of IV.143. 16 years; and GM10668. offspr.ng of lV-141, 16 
years. Five subjects has previous abnormalities of glucose tolerance but none had ever had an abnormal 
fasting plasma glucose or glycosylated hemoglobin concentration. Two had single diabefc OGTTs 4 and 
22 years, respectively, before the study but had numerous nom^al glucose tolerance tests subsequently: 
GM10018, IV.168. 25 years; and GM8072, IV-143, 39 years. Three subjects had fulfilled NDDG 
,0 diagnostic criteria for diabetes by OGTT in the past. Prior to the study they had normal OGTTs on 2. 4 
and 5 occasions, over 2, 4 and 4 years, respectively. They were: GM11600, offspr.ng of IV-143, 14 
years; GM8759, IV-166, 31 years; and GM8073, offspring of 143, 19 years. 
Diabetic a?R8X mutatiop -p"^'tivB qrouo (Di-t-11 

The four subjects in this group ad consistently diabetic OGTTs for 6 or more years or ad mild 
15 fasting hyperglycemia « 200 mg/dl) when untreated. They were GM8106, 111-35. 59 years; GM7974. 
IV-141, 43 years; GM8107, 1V.165. 26 years; and GM10724, offspnng of IV.142, 17 years. Subject 
GIVI8106 was treated with tolbutamide between 1958 and 1968 and with chlorpropamide since May, 
1995. When untreated, his highest fasting plasma glucose was 160 mg/dl and his highest total 
glycosylated hemoglobin 9.1% (nomial < 6.3%). On 100 mg of chlorpropamide per day. his fasting 
20 plasma glucose was 91 mgldl and glycosylated hemoglobin was 5.3%. Chlorpropamide was discontmued 
for 26 days before the study and fasting plasma glucose was 99 mg/dl and total glycosylated hemoglobm 
concentration was 5.8% on the day of the study. Subject GM7974 was treated with diet alone. She had 
diabetic OGTTs intermittently since 1969; OGTTs were consistently diabetic since 1990. Her fasting 
plasma glucose was 84 mgldl and her total glycosylated hemoglobin was 6.9% at the t.me of the study. 
25 Subject GM8107's highest fast.ng plasma glucose was 192 mgfdl and highest total glycosylated 
hemoglobin was 9.5% when untreated. When treated with glyburide 1,25 mg daily, she had normal 
fasting and postprandial plasma glucose concentrations and a total glycosylated hemoglobin of 6.7%. 
Glyburide was discontinue 1 1 days before the study. Her fasting plasma glucose concentration was 106 
mg/dl and her total glycosylated hemoglobin was 6.9% on the day of the study. Sub,ect GM10725 had 



BNSDOCID- <WO 961 1254A1_!_> 



10 



15 



20 



25 



WO 98/11254 

PCT/US97/16037 

132 

^ee„ „e.ed w„h „v.„,.e 2.5 ,.,ce da„v .„ce ,989. He, h„.es. ,.,a, glvcosyia.ed ^s™„o.„ 
co„ce„„a„„„ „as 9.0%. She ..scon.ln., ™.ca„o„ 5 .a,s be.ce ,as„„g p,as.a 

glucose was ,58 mg,d, and her ,„,al glyc.syla.ed he™g,obi„ was 7.7% „ „™ „, ,he s,.dv. 

S„b,a«s were studied in ,he Universuy of Michigan Ge„e,a, Clinical Research Center ,CRC, 
Sub,ec,s were ad.med ,0 ,he CRC in ,he even,n, and s.ud.ed ,n ,he rec™.en, p„s,„„„ a„e, a ,0 ,2 
hour overnigh, ,as,. An intravenous sampling ca.he.ar was inserted in a retrograde direct.on ,n a dorsal 
ve,n of the hand and the hand was kept In a wooden hex thermos.aticallv heated to 60«C to ach,e.e 
artenalLation o, venous blood. A second catheter ,or .sulin, arginrne and glucose adnt.nis.ration was 
■nserted Into the contralateral an.ecubita, .e,n. In suh,ec,s w,th tasting hyperglycemia, a s.a,l 
.mravenous bolus o, hu.en regular insulin ,0.007 U,k, „r appr„.,.3te,y 0.5 U, was g.ven a, .50 ™„utes 
to lower the plasma glucose to approximately 75 mg/dl. 

Blood samples for ™asure«n, of basal glucose, .nsulin, Cpeptida, and glucagon concentrafons 
were obtained at -30. .20, -,0. and 0 minutes. At 0 m,nu,es, arginine was administered The ,ota, 
argmme dose was calculated as 0.4, gmftg body we.ght to a maximum „, 30 grams. A, t,me 0, 5 grams 
of arg,ni„e was administered as an IV bolus o.e, 30 seconds and at time 6 mmutes, the rema.n.ng 
argmme was infused With a pump a. a constant rate ..er 25 minutes. Samples were drawn at 2 3 5 7 
10, 20, and SO-minutas tor measurement of glucose, insulin, Cpeptlde. and glucagon, following the first 
argmtne bolus and infusion, there was a 60 minute washout period. Blood samples for measurement of 
.he same constituents wera obtained at 40, 50, 60, 70, 80, and 90 m,nutes. At 90 minutes glucose 
(160 m„k,) was administered over 30 seconds and a .ariable rate «,f„s,on o, 20% de„rose with 10 mE, 
KC,„ was begun to clamp the plasma glucose tevel at 200 m„« for the remainder of the study as 
-etermmed by „e,ue„t bedside blood glucose „»asu,ements. Blood samples for the above const.toents 
ware obtained at 92, 93, 95, 97, ,00, 1,0, ,20, ,30, ,40, and ,50 minutes. At ,50 mrnotes, ,rg,n,ne 
10.4, gm/kg. maximum 30 grams, was again administered as a 5 gram bolus followed after 5 minutes by 
an ,nfus,on over 25 mmutes, as previously, and samples were drawn at ,52, ,53 ,56 ,57 ,50 ,70 

.80, ,90, 200, 2,0, 220, 230, and 240 minutes for measurement of glucose, insulin, Cpeptlde and 
glucagon. 
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Assay procedures 

All blood samples «e,e .olleo.ed on ice end stored a, .70»C until assayed. Plasma glucose was 
measured on a Kodak EK.achem 700 Analyze, using a hexok.ase method W-a assa, coefl.cen, ol 
.ariation ICVl 1.7% a, 5.0 mmol and 1.2% a. 16.1 n,"oll. ln,muno,eac,i,e insulin was measured by 
double anfbody .adioimmunoassay IRIAl (intra-assay CV 6.4%1 IHayashi e, el. ^Slli Cpept.de was 
measured by a specif. RIA (in,ra.as.ay CV 3.9%1 (Faber e, a/., 19781. Glucagon was measured by 
double.an,ibody radioimmunoassay (Intra-assay CV 3.2%) (Hayasbi e, sL y^llY All samples were 
measured in duplicate and their means were used. Samples irom indiyidual subjects were measured ,n a 
single assay. All assays were pedomted in the Michigan D.abetas Research and Training Center 

Chemistry Core laboratory. 
Data analysis 

Acute insulin responses lAIR). acute Cpeptide responses lACRl, and acute glucagon responses 
,AGR1 were calculated as the mean of the 2, 3, 4, and 5 minute hormone leyels nunus the mean of the • 
,0 5 and 0 minute hormone levels. Glucose, insulin, Cpeptide, and glucagon areas under the curve 
„ere calculated with the trapezoidal rule for the t,me rntorva, 10 to 60 minute when the arginine bolus 
was administered a, t,me 0 and the arginine infusion began at t,me 6 minutes. Baseline values, calculated 
as the mean hormone levels measured at -10, -5, and 0 mrnutes immediately preceding the argrnrne bolus, 
were subtracted from the areas under the curve. Insulin secretion rates were calculated by deconvolutron 
of C peptide values IPalonsky « ./., 1986). All of these indices af insulin secretion ware assessed du„n, 
arginine adnvnistration at baseline glucose levels, during glucose administration, and during ar„n,ne 
administration during the hyperglycemic clamp. Slope of potentiation was calculated as the drfference 
between the AIR or ACR to arginine obtained during the hyperglycemic clamp and at baselrne glucose 
levels divided b, the difference between these two gfucose levels (Halter « a/., 19791. Results are 
expressed as means ± standard error of the mean. Statistical significance of differences among groups 
was assessed with ch.-sguare and unpaired t-tests. The primary comparisons of interest were between 
the MOH and NDl^l group. P < 0.05 was defined as the limit of statistical significance. 

""tg'hteen members of the RW Pedigree were studted: Seven non-diabetic mutation negative INOl 
„, seven non-diabetic mutation positive INDLII, and four diabet.c mutation positive IDLll (Table 13), 
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Thsre were no s,g™«can, d,f farsnces a™™^ proaps w„h rajart .„ ,enda, or aga, although D, , j sub,ac,. 

.e"ded ,0 h. Olds, Al, sub,ac,. wera „c„6a3e. Fas.iag glucose and ,„sul,„ te.als did no, d.ffa, 
s,9™,(,ca„.lv a^ong groups although Oi.) sub,ac,s landad ,o ha.a h,ghe, glucose la.els and Iowa, ,nsul,n 
levels. fas„ng C-pep„de levels we,e lower ,n OH suh,ec,s compared ,o NOi-, suh.acs Fas„ng 
glucagon levels did no, differ among groups. Glvcosylafed hemoglohrn concen„a„o„ did no, differ 
between ,he two nondiahelic groups, but was higher in the 0|*i group. 

Table 13 

Chara crlstics of Subjects from RW P.di„e^ c.s, Tolerance and M„,a,i„n Status 



Glucose Tolerance 
Genotype* 

dumber and gender (M/F) 
Age (years) 
Body Mass Index (kg/m^) 
Fasting glucose (mg/dl) 
Fasting insulin (nU/ml) 
Fasting C-peptide (ng/ml) 
Fasting glucagon (pg/ml) 
Glycosylated hemoglobin 



njondiabetic 

(• 

512 
24 ±4 
25.2 ± 1.5 
91 ±2 
10± 1 
1.8 ± 0.1*' 

73 ±6 
5.5± o,r* 



Nondiabetic 

T 

23 ±4 
23.1 ± 7.0 
87 + 2 

n ±2 

1.6 ±0.2 
64 ±9 
5.7± 0.2'* 



*[ ] - Normal/Normal 

[ + ] = Normal/Q268X Mutation 



Diabetic 

173 

36 r 9 
22.5 r 0.4 
n2r 16 

7r 1 
1.3 ±0.2 
77r 12 
7.8 r 0.4 



**p < 0.05 vs. diabetic I + ] 
All values are mean ± SEM 



FIG. 19 demonstrates the protocol and illustrates concentrations of gtacosa (FIG. I9A) insulin 
(FIG. 19B), C-peptide IFIG. 19C1, and glucagon ,FIG. 19DI dunng the three phases o, the study These 
»ere: A) administration of arginine (bolus and infusion) at basal glucose concentrations, Bl administration 
of glucose (bolus and variable ,=,e infusion) to clamp the glucose level a, 200 mg/dl, and C) adm,n,str,tion 
of argmine (bolus and mfusioni during tbe hvperglycemic clamp. 

Table 14 summarizes average glucose levels; acute insulin responses (AIR) and C-paptide 
responses lACR) to arginine; and hormone areas ^de, the curve lAUC) and insulin secretion rate (ISR) 
measured 10 ,„ 60 m.nutes followrng commencement of the three study phases These are A) 
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X ♦ i,,coi ni.irn«P rnnceiitrations, B) administration of glucose, and C) 

administration of arginine at basal glucose concenuduun^., 

administration of arginine during the fiYperglycemic clamp. 

Table 14: Plasma Concentrations of Glucose. Acute Insulin and C-peptide Responses (AIR and 
ACR) Ls Under the Curve (AUC 10-60 minutes) for Insulin and C-pept.de and Insuhn 
sL e^n Rate ISR) during administration of A) Arginine at basal glucose concentrations (Bolus 
LTlnfr-onl B) Glucose (Bolus and Infusion) and C) Arginine (Bolus and Infus.on) durmg 
hyperglYcemic clamp. 



- period Group ' Nondiobetic I I Nondiabotic I^H Di^^^'* ' 



« Aroinino administration at basal glucose concentration 

Glucose (mgldll- 107 ±3 102 ± 2 115 r 16 
AmiHUW) 48 ± 10 70 ± 19 27 ± 7 
ACR(ngW) 3.05 ±0.61 3.25 ± 0.44 2.19 ± 0J6 
AUCNImll 78.5 ±7.7 25^6 ±.5 33±a8 
AUC.NW, 205 ±12 ± „ | , 
ISR liag) 76 ± 6 J 1 - "J 
B. Glucose administration nnt^i 
Glucose (mgldl)- 207 ± 2 207 ± 5 203 ± 
.... T>.i. in Rr^ + 15 16 ±6 



72± 10 63±15 
ACR(ng/ml) 4.03 ±0.61 2^83 ±0.54 1 25 ±0^58 

. . M /lOQj.R'? 471±ll-4 lb.l±^.i 



AIR (liUlmll 



AUC,(nB/mll 43.9 ±6.3 4^1 ±"^4 16.1 ±4. 

AUClnsWI 131 ±12 '0±B 61 ±22 

ISRlMl 63±4 61 ±6 33±2 
C Arginine administration during hyperglycemic clamp 

Glucose imgldll- 198±2 209±7 201 ±6 

AIR(^iUlml) 271 ±33 62±36- 50±10 , 

ACRIngWl 10.33 ±1.31 5.87 ±a72 3,21 ± 0_91 

AUCmgWI 628 ±69 49 ± 40 25 ± 7 

AUCcNW 739 ±52 209 ± 40 109 ± 42 

ISrU 276 ±18 101 ±19 54±16_ 

• mean for period 10-60 minutes 
10 All values are mean ± SEM 

••d<0 05 "p<0.01 ^p<0.Q01,NDl-lvsNDl] 

' p < 0.05 ^ P < 0 01 ' P < ° 
*p < 0.05D[+lvsNDl+l 
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Effects ofArginine and Glucose on Insulin Secretion 

Administration nf A rgmine at Basal ninrncp rnnr^^^^^ 

At baseline, glucose levels did not differ among the groups (Table 13) After the 5 g arg.n.ne 
bolus, AIR and ACR did not differ among groups but tended to be lower for the D[.] group (Table 14, 
During and after the subsequent arginine .nfusion, glucose levels were slightly higher at 10, 20 and 30 
minute intervals in the NDM as compared to the ND(.] group ,F,G. 19, but the average glucose levels 
during the 10-60 minute time interval (Table 14) and the glucose area under the curve (1171 ^ 99 vs 
1012 ± 141 mg/dl, respectively, p - 0.37) d,d not differ. Insulin and C-peptide levels rose to a peak at 
30 minutes in the ND(.] group but were markedly decreased in both the N0(., and D[.] groups (FIG 19) 
The insulin area under the curve (AUC,) and Cpeptide area under the curve (AUCd were significantly 
reduced in N0[.] group compare to ND|.] group (Table 14). They were further reduced in Dh, group 
compared to the NDf.J group (Table 14). ISR was significantly reduced in ND(.| compared to NDI] 
subjects and further reduced in D(*] compared to rJDf.) sub,ects (Table 14). 
Administration of Glucose 

Glucose levels did „„, dif.er an.org ,hs groups during ,he b.ios and .he vanable ra.e gCcose 
■nf-sron (Table ,4,. AIR and ACR ,o glucose did „„, dif.e, 6e,wee„ ,he NDL, and NO,-, groups 5u, were 

srgn,f,can.l,reducad,n,heDl.|g,oup compared ,0 .he NOI-lgrouplFIG. 19, TaWe Ml, AUC, AUCc and 
ISR dunng the glucose infusion did no. differ be.ween .he NDI-i and NOI.) groups (Table Ml They were 
reduced in the Dl-eJ group compared .o the ND(-) group (Table 14), 

Administration ol Arginine during t/,e Hyperglycemic Cinmp 

Glucose levels did no. differ among ,he groups during ,ha variable ra.e glucose infusion and 
«cond argrnine bolus and infusion (Table 14), At hvperglycem.c plasma glucose levels, as compared to 
euglvcamic levels, AIR and ACR to argintne, and AUC,, AUC, and ISR were enhanced and differences 
among groups we,e greatly magnified (fIG, 19, Table 14), Al, indices of insulin secretion were 
significantly reduced in .he NDM group compare .. .he NO, i group and .here was a further reduction in 
the D[ + ] group (Table 14). 

FIG. 20A and FIG, 20B demonstrates the slopes „( potentiation for insulin and Cpeptide 
respecively. Glucose po.en.iahon of arginine-stimulated insulin secretion was reduced in both ,he NDI . I 
10,80 ± 0.18) and DI.1 10,24 ± 0,04) groups compared to the NDI-l group ,2.12 ± 0.25 p < 0001) 
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The insulin slope ol pn.en.iauon was also tedoced in Dl-I jroup compared to NDI *1 g-oup (p < 0.06). 
Glucose polentiatmn ol a,9in.ne.stimula.ed Cpeptide secet.on was also reduced ,n the NDI^l (0.02 ± 
0.001 and DI*II0.01 ±0.001 9-oups compared to the NDI-19-0UP W.07±f)l".P < 0.011. 
effects ofArgmim ui PI'S"" Bfucagm Ccncmtratims 
5 At baseline, glucason levels did not differ among groups (Table 13). Acute glucagon responses to 

the 5 a bolus of arginine administered at basal glucose concentrations did no, differ significantly among 
NDI-l N01*l, and 01.) groups 1.04 ± 19, 92 ± 16, and 82 ± 23 pgW, respectively!. On the other hand, 
the glucagon area under the curve (10-60 minutes) during and following the arginine infusron at basal 
glucose concentrations was reduced in Dl.l compared to NDl-l subiects (4,778 ± 1,087 .s. 7,549 ± 639 
,0 pglml p < 0.05). NDl*! subjects showed intermediated .olumes (6,772 ± 734 pglmt p - 0.09 .s. NDI- 
I group) Durrng the hyperglycemic clamp there were no srgnifican. differences among glucagon areas 
under the curve for any of the groups (4,237 ± 406, 3.963 ± 508, and 2,941 ± 568 pglml, for NDI-l, 
NDI.l and DKl, respectively). To assess the impact of giucose infusion on the glucagon response ,o 
arginine in the t.,ree study groups, the Inventors assessed the differences in glucagon area under the 
,5 curve between the euglycemic and hyperglycemic periods. Decreases in glucagon areas Induced by the 
hyperglycemic clamp between the first and the second arginine infusion were 3312 ± 404, 1809 :r 387, 
and 1836 ± 535 pg/ml for the NOI-1, NDLI and Ol*! groups, respectively (P < 0.02 NDI-l vs. NDl^l- 

tXAII/IPlET 

f«DDY Due to Mutationsin the HNF4a Bindln, Sit. in tho HNF la Gene Promoter 
20 Recertt studies have shown that mutations in the transcription factor hepatocyte nuclear factor 

(HNFl-la are the cause of one form of maturity-onset diabetes of the young, M0DY3. These studtes 
have idemlfled mutations In the mRNA and protein coding regions of this gene that result in the synthesrs 
of an abnorrrtal mRNA o, protein. Here, the inventors report an Italian family In whrch an A-.C 
substitution at nucleotide -58 of the promoter re»on of the HNF-la gene cosegregates with MOOY. This 
25 mutation Is located in a highly conserved region of the promoter and disrupts the binding s,.e for the 
transcnptlon faCo, HNF4a, mutations in the gene encoding HNF-4a being another cause o( NIDDY 
IM0DY1) This result demonstrates that decreased levels of HNF-lap^rse tan cause MDDY. Moreover, 
i, indicates that both the promoter and codin, regl^ts of the HNF la gene should be screened for 
mutations in subjects thought to have MOOY because of mutations in this gene. 
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1. Method 

Subjects 

The MODY family Italy 1 was ascertained through the d.abetes cim.c of Santo Spirito's Hospital 
Affection status was determined using criteria of the National Diabetes Data Group. The affection status 
of unaffected family members was defined as normal or impaired based on the results of a standard 75 g 
OGTT. This study had institutional approval and all subjects gave informed consent. 
Linkage anal/sis 

Family members were genotyped with ihe marlcers D12S321, 0I2S76 ar.d UCSS all ol which 
are lightly linked to the HNF-la gene IMODYSI IVamagata ,996, The forward and reverse primers 
for the polymorphic sepoence tagged si.e (STS) UC-39 are S' GCAACAGAGCAAGACTCCATCTCA 3' (SEQ 
ID 1.0:1221 and 5'.GAGTTTAATGGAAGAACTAACC-3- ISEQ 10 N0:,23, respectrvely, and the PCR 
rncl-ded initial denaturetion a, 94»C for 6 m,n and 35 cycles of denatura.ion e, 94"C for , mm 
annealing at 63°C for , min and extension al 77'Z for , min wrth a final extension et 72»C for 10 m,n 
The forward pnmer was labeled with "p and the MgCI, concentration in Ihe react.o™ was , 0 mM The 
PCR was carried out in a GeneAmp 9B00 PCR System (Perk.n Elmer, Norwalk, CTI, The PCR prodocts 
were separated by electrophoresis on a 5% polyacrylamide seqoertcing gel and .isoalized by 
aotoradiography. Tests for linkage were carried out using the haplotype formed from D12S321, 012S76 
and UC 39 and assuming a recombination frequency between adjacent markers of 0 001 with the 
computer program MLINK from the LII^KAGE package Iversron 5 ,1 ILa.hrop et al., 19851 The 
freguencies of ,he haplotypes were estimated from the data. The analysis assumed a disease allele 
frequency of 0.00, and two liability classes. Liability class 1 inclrrded individuals whose age was >25 
years of age with penetrances of 0.00, 0.95 and 0.95 for the nomtal homozygote, heterozygote and 
susceptrble homozygote, respectiyely. Liability class 2 rncloded individuals <25 years of age with 
penetrances of 0.00. 0.50 and 0.95 f.r the normal homozygote. heterozygote and susceptible 
homozygote, respectively. The affection status of the one subiect with impaired glucose tolerance was 
coded as unknown. 

Identification of mutations 

Each exon and mmimal promoter region of the HNF la gene of subjects 11-5 and lli l were 
screened for mutations as described previously (Yamagata al.. 1996; Kaisaki et al.. 1997). The 
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mutation v.as confumed by clon.ng the PCR product .nto pGEM-AZ and sequenc.ng clones denved iron, 
both alleles. The presence of the mutat.on in other family members and unrelated nond.abet.c sub,ects 
was tested by PCR amplification of the proximal promoter regmn and direct sequencing. 

2. Results 

Linkage studies 

The NIDOM in ,he pedigree Italyl has .he di-icel features of MODY including au,oson,al 
dominant inheritance and age a. diagnosis <25 years in multiple family members (Fig. 21). The s,x 
affected members are treated with either ins* Ondi^duals il-l. 115 and ill-S, or oral hypoglycemrc 
agents 111-? lU-l and 111-21. The three subjects on insolrn therapy showed evidence of drahetrc 
compilations includm, retinopathy lll-l and 11-51 and nephropathy illI-91. One member of this pedigree. Ill- 

6 has impaired glucose tolerance. 

The polymorphic markers D12S321, D12S76 and UC-39 which are closely linked to the HNF-la 
gene (order: cen - D12S321 ■ D]2S76 - HNF-la ■ UC.39 - Qter) were typed in this family. The haplotype 
3-3-7 co-segregated wrth MOOY with no obligate recombinants (Fig. 211. One subject with IGT (age, 18 
years) also inherited this haplotype as did two unaffected young women, individuals 111-6 and 111 13. of 21 
and 14 years of age. respectively. These three subjects may be a. risk of developing diabetes ,n the 
future The lOD score in this family was 1.28 at e recombination fraction of 0.00. Although thrs lOD 
score does not meet fomral criteria for establishing linkage lie. the 100 see is <3.0), the p value 
associated with the evidence fo, linkage is 0.008 which is sufficient to justify a search for mutatrons . 
20 the HNF-la gene. 

Mutation screening. 

Two diabetic subjects, 11-5 and 111 1, were screened for mutations in the HNF la gene No 
mutations were found on screemng the mRNAIprolein coding regions, e.ons MO, although the subjects 
were heterozygous (or several previously described polymorphisms (Yamagata et al.. 1996). Smce no 
25 mutations were found in the coding region of the HNF-la gene, the proximal promoter region was 
screened This analysrs revealed that both affected subjects were heterozygous for an substrtutron 
at nucleotide -68 which is located in a highly conserved region of the promoter of the HNF la gene that 
includes the bindmg site for HNF4a (FIG. 22) (Tian and Schibler « ./.. 1991; Kuo e. al.. 19921. Smce 
this mutation does not lead to gain or loss of a site for a restnction endonudease, it was tested for by 
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PCR ampl,fo„„„ and .eq„enc,ng. The A^C subs.„u„„n a, nucleCde 58 ca segregated >v„h 
a. "rt haplcvpe ,„ ,he l.aly-, pedigree ,f,G. 21) and was „„, presen, ,„ a sample of 50 unrelated wh„e 
siibiects rmplyrng that it ,s the mutation responsible for MODV ,„ this family. 

EXAMPLES 
Mutation in HNF-lp associated with MODY 
HNf la and HNF.4C, are members of a complex transcriptional regulatory net»ork „hich 
includes other homeodomam proteins and nuclear receptors as well as members of the forkhead/winged 
helix and leucine :ipper CCAAT/enhancer binding protein famif.es (Ceregh.ni. 19961. The in.emors ha.e 
screened .wo other members of this network, HNF-lp (M.ndel e, al 1 99Ia; De Simone ,99, 
Rcy-Ca„,pos e, al . ,99,, Bach and Yaniv, ,9931 and the bifuncional protein dimerization cofact.r 
HNF., (DCoHI,p,er,n 4 carbinolamine dehydratase (PCBD, IMe„del c aL 199,b; Citron c, „; ,90^, 
for mutations in Japanese subjects with MODY. m diabetes-associated mutations were found ,„ OCoH 
Howeyer, the inventors feund one sub|ec, with a nonsense mutation, R,77X, in HNF-lp which co- 
segregated with earlyonset diabetes. The identification of mutations rn three members of the HNF-fanrily 
of transcription factors indicates the importance of this regulatory network in the maintenance ,f glucose 
homeostasis. 

1. Methods 

Study population. 

The study population consisted of 57 unrelated Japanese subjects attending the Diabetes Clinic 
of Tokyo Women's Medical College who were diagnosed with NIDDM before 25 years of age and,cr who 
were members of families ,n which NIDDM was present in three or more generations: age at diagnosis 
20.1 . 7.5 years (mean . SEj; male/female, 31/26; and treatment, insulin • 36. oral hypoglycemic agents 
- 10, and d«, - 11. These sub|ec,s had been screened fo, mutations in the HNf-UMODYS gene and ail 
«are negative for mutations in this gene (Lazzaro e, aL 199:,. Thirty.two of the subjects met strict 
critena for a diagnosis of MOOY NIDDM in at least three generations with autosomal dominant 
transmission and diagnosis before 25 years of age in at leas, one affected strbject). NIDDM was 
diagnosed using the criteria of the World Health Organization (Bennen, ,994). A, the time of 
racruitment, informed consent was obtained from each sub,ec, and a blood sample was taken lor ONA 
isolation. Filty.three unrelated nondiabetic Japanese suhpects were tested for each nucleotide 
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substitution and mutation to detennine if the sequence change was a polvmorphism or disease-assoc.ated 

mutation. 

Pedigree J2-20. 

The proband (subject III 2, FIG. 25) presented with glucosuria at 10 years of age and was 
5 hospitalized. She was diagnosed with diabetes and treated with insulin for two days and then with diet 
only for two years. At 12 years of age. she resumed insulm therapy (28 U/day). She came to cimical 
attention again at 21 years because of a pyelonephritis and poorly controlled diabetes. At 23 years of 
age. she was admitted to the hospital of Tokyo Women's Medical College because of blurred vision. Her 
urine C-peptide levels at this time were 3.2 giday (normal. 50 ± 25 g/day) indicating low insulin secretory 
10 capacity. Despite persistent high blood glucose levels, she had no history of ketosis. The subject was 
diagnosed with NIDDM based on her clinical course. Subject 1113 presented with general fat.gue at 15 
years of age. He had gained 15 kg during the previous three months and his weight at the time of 
presentation was 75 kg. He was diagnosed with diabetes and was treated first with insulin and then diet 
and exercise. He was well controlled when he maintained his weight at 60 kg. At 18 years of age, he 
15 had gained weight again and insulin treatment was initiated. His urinary Cpeptide .t this time was 
57.5 gIday with fasting C-peptide and glucose levels of 2.4 nglml and 106 mg/dl. respectively. There was 
no history of ketosis and he was diagnosed with NIDDM. He presently shows diminished pancreatic-cell 
function with no increase in C-peptide levels following administration of glucagon. All individuals shown 
in FIG. 25 were invited to participate in this study but many declined to do so. 
20 Isolation and partial sequence of human HNF- lb gene. 

The PAC clone 319P12 containing the human HNF-ip gene was isolated from a library (Genome 
Systems, St. Louis. MO) by screening PAC DNA pools using polymerase chain reaction (PCR-) and the 
primers vHNFPl (5'CCTCATGGAGAAACATCCTAAGT.3'1 (SEQ ID N0:124) and vHNFP2 
|5'.AGGGAGTGCACGGCTGAGCTCCTG-3') (SEQ ID NO: 125). The sequences of the exons, flanking 
25 introns and promoter region were determined by sequencing PCR- products and appropriate restriction 
fragments cloned into pGEM*4Z (Promega, Madison. Wl) with an AmpliTaq FS Dye Terminator cycle 
sequencing kit (Perkin-Elmer, Norwalk. CT) and ABl Prism- 377 DNA sequence,. Primers lor PCR- and 
sequencing were selected using the e.on-intron organization of the human HNF-la gene (Yamagata 
a/„ 1996a) as a guide since related genes often have similar exon intron organizations. The partial 
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sequence ,.s ho.an HNF-lp ,e„e ,„c,ud,„g p,o™,e, ha. been depos„ed ,„ GenBenk .a.ab.e 
under accession numbers U90279 90287 and U96079. 
Mutation screening. 

The nine a«„s. flanking ,„„.„, and m,„™i pr.mca, reg.on of ,he HNF lp gene »e,e an,pli„ed 
^ .3,ng PCR~ and specfic p,i™,s (Table ,7, and ,he PCR~ p,„d.c,s „e,e se,uenced f,„. h„,h end. ,s 
descnhed abc.e. PCR™ ,o, exon . .a. carried o„, using .10NG«. En.,n,e- Mi, (Life TechnCgres 
Grand Island, NV) wi,h de„a,.ra„p„ a, 94»C fcr , ™n fpllowed hy 35 cycles of de„a,ura„on a, 94-C for' 
30 s, annealing a, 55»C for 30 s and extension a, 68»C .„, , .in. and ,i„a, exiensron a. 68»C for 
'0 PCR™ ,„r exons 2 9 was carried ou, using Te, DNA polymerase and 1.5 mM MgCl, „„h 
dena,ura„on a, 94»C for 5 .n followed by 35 cycles of de„a.ura„on a, 94»C „r 30 annealing a, 
60»C for 30 s and sx.ensron a, 72"C for 30 s, and final exfension a, 72-C for ,0 m.n. The sequence of 
each mu,a.,on was confirmed by cl.nrng ,he PCR™ product i„,o pGEM^ T Easy (Promega. Madison Wl, 
and segueocmg clones representing both alleles. Exons 2-4 of the DCoH gene were amplified using T„ 
DNA polymerase/f.S m MgC, and specific primers (Table ,6, and seguenced as described above Exon 
15 1 of the DCoH gene encoding the 5' untranslated region and the mitiatrng Me, was refractory ,o PCR™ 
amplification and therefore was no. screened for mutations. The presence of a specific mutation or 
Po(ymorph,sm in other individuals was determined by PCR.RFLP analys.s if i, resufted in the gain/loss of a 
..te fo, a restriction endonuclease, or PCR™ and direct sequencmg if there «,as no change in a site. 
Linkage studies. 

20 The human HNF-fp ISTS WI-7310) and DCoH genes were mapped and confirmed to YACs 969C9 

(Chromosome ,7, (s.huler « ,996, and 849H3 (chromosome ,0,, respectively. The adiacent 
polymorphic STSs DI7SI788 and D,0SI688 were tested for linkage with NIDDM in Japanese affected 
pairs (258 and 268 possible pairs, respectively,. In the genome-wide screen of Mexican American 
affected sib pairs 23, the HNF-lp and DCoH genes are in the intervals D17S1293-D,7SI299 and 

25 010S589.010S535, respectively ISchuler « a/„ 1996), 

^^'"'"tivatbn studies of normal and mutant human HHftb. 

The construe, pcONA3,,.HNF.1p was prepared by cloning the type A h.„an HNF-tp cDNA 
(nucleotides 195-2783 inclusive, GenBank Accession No. X58840; SEQ ID NO:, 28, in.o pcDNA3 „ 
(Invurogen, Carlsbad, CA,, The R,77X mutation was introduced by srte-direced mo,age„es,s 
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(QuikChange- mutagenesis kit; Stratagene. La Jolla. CA) to generate pcDNA3.1.HNF-ip-R177X. The 
reporter ger,e construct pGLS-RA was prepared by cloning the promoter of the rat albumin gene, 
nucleotides -170 to .5 (Rmgeisen et al. 1993), into the f.refly lucferase reporter vector pGL3-Bas.c 
(Promega, Madison. Wl). The sequences of all constructs were confirmed. HeLa cells were transfected 

5 for 5 hr using lipofectAMlNE^ (GIBCO BRL, Gaithersburg, MD) with 500 ng of pGL3-RA. 250 ng of 
PCDNA3.1HNF-1P or pcDNA3.1-HNF ip •R177X, and 25 ng of pRL SV40 to control for efficiency of 
transfection. pcDNA3.1 . DMA was added to each transfection so that the final amount of DNA added 
was 2g. After 24 h. the transactivation activity of the normal and mutant HNF ip proteins was 
measured using the Dual Luciferase'" Reporter Assay System (Promega. Madison. Wl). 

10 2. Results 

The nine exons, flanking introns and minimal promoter reg.on of the human HNF ip gene [TCF2\ 
which encode all forms of HNF-ip were screened for mutations in 57 unrelated Japanese subjects with 
MODY. This analysis revealed four nucleotide substitutions, a C T substitution in codon 177 (exon 2) in 
the proband from family J2-20 which generated a nonsense mutation CGA (Arg) TGA (OP) (R177X) (FIG. 
1 5 24), an uncommon silent mutation in codon 463 (exon 7) for which one subject was homozygous, and two 
polymorphisms in intron 8 (Table 15). neither of which is predicted to affect RNA splicing. The nonsense 
mutatmn R177X was not found on screening 53 unrelated non diabetic Japanese subjects. One 
nondiabetic subject was heterozygous for the silent mutation in codon 463 (Table 15). 
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Table 15 




Mutations 


and DMA polymorphisms in 


human HNF and DCoH genes 




Location 


Nucleotide 


Frequency 


Site 


Codon 


Change 


Patients (n= 57) Controls 


A. HNMp 
Exon 2 
Exon 7 
Intron 8 
Intron 8 


177 
463 
nt 48 
nt 22 


CGA(Arg)->TGA (OP) 
ijLL(Ala)— >GCT(Ala) 
Insertion C 
C->T 


0-0.99; T-O.Dl C l.OO; T-O-OO 
C 0.98; T-0.02 C-0.99;T 0.01 
C0.12 C0.17 
C-0.7i;T0.29 C-0.68; 7 0.32 


Exon 4 


nt 9306 




A A 0.80; G-0.2D 



DNA polymorphisms found ,„ i„„„„s are rioted relative to ttte splice donor or acceptor site nt 
nucleotide. In the HNFLp gene the C-.T substitut.on ,„ codon 463 and the Cnsertion polymorph.sm i„ 
intorn 8 nt 48. result ,n the goin of a Dde I site and loss of a Nae I, respecti.ely. In ,he huntan DC.H gene 
(Genbani, accession no. 141 560. incorporated herein by referencel. the nt 9306 is in ,he re,ion encod^g 
the 3'.untranslated region of DcoH ntRNA and is 36 nucleotides after the translation terntination codon. 

Family J2 20 shorn bilineal inheritance of diabetes (FIG. 25). The RI77X mutation, which was 
maternally inh.r.ted, ,s assocated w„h early.onset NIDOM, progression ,o insulin treatment and severe 
comphcations. The earlier age at diagnosis in the proband and her brother may be due to the inhentance 
Of d,abetes.susceptibili,y genes from both parents. The paternal diabetes gene which may potentiate the 
effect of the HNF-tp mutation is unknown but is not another known MODY gene as mutations were not 
found in the HNF Ia and HNF.4c, and glucokinase genes of the proband lUasai.. e. a,.. ,997; Fur.,a 
« a/., 1997; Iwasaii « ./,. mil The proband's older brother had been healthy until develop.ng a 
common cold and died one week later of diabetic ketoacidosis. The proband's maternal grandparents 
both of whom are deceased, were not known to have diabetes. However, she has a maternal uncle with 
m,ld d,e> cor,„olled NIDDM dragnosed at 60 years of age. The difference in phenotype between the 
proband's mother and maternal uncle and the absence of diabetes in the maternal grandparent, suggest 
that the R177X mutation may represent a new mutation in the proband's mother. The lather and ,w. 
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paternal uncles have iate onset NIDDM treated w.th oral hypogiycemic agents. The proband's paternal 
grandmother was reported to have had diabetes. The presence of MODY and late-onset NIDOIVI with.n 
the same family >s not unusual and has been reported previously (Bell al. 1991 ). With respect to the 
presence of nephropathy in the subjects with the R177X mutation in HNF-ip , it is interesting to note 
that HNF IP is expressed at highest levels in kidney (Mer^del ./ al . 1991a; De S.mone a al .1991; 
Rey-Campos et al. 1991; Bach and Yan:v, 1993; Lazzaro e: al.. 1992) and perhaps decreased 
levels of this transcription factor contribute to renal dysfunction. 

HNF-ip contains a bipartite DNA binding region consisting of a POU-iike element and a 
homeodomain (Mendel et al. 1991a; De S.mone ei al. 1991; Rey-Campos ei al. 1991; Bach and 
Yaniv, 1993). The R177X mutation is located at the end of the POU-like domain and generates a protein 
of 176 amino acids having the NH.-dimerization and POU domains (Cereghmu 1996; Mendel ei al. 
1991a; De S.mone et al. 1991; Rey-Campos et al. 1991; Bach and Yan.v. 1993). This truncated 
protein cannot stimulate transcription of a rat albumin promoter-linked reporter gene and does not inhibit 
the activity of wild-type HNF-ip (Table 16). This suggests that the R177X mutation represents a loss of 
function mutation which results in decreased HNF-ip levels and a corresponding reduction in expression 
of HNF-ip target genes. 

Table 16. 

Transactiviation activity of human HNF-lp and R177X mutation. 



Construct 


Normalized Activity 
(Firefly Luciferasel^M/7/a luciferase) 


pcDNA 3.1 


3.5 ±0.5 


pcDNA 3.1-HNF-ip 


25.1 ±3.2 


pcDNA 3.1R177X 


3.8 ±1.0 


pcDNA 3.1-HNF-lp + pcDNA 3.1R177X 


32.2 ± 2.8 



The activity of each construct was meassured in triplicate and the mean ±S0 is shown. These 
results are representative of at least two independent experiments. 
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Human DCoH is a protein of 104 amino acids (including the initiating methionine) (Thony et al 
1995). Exons 2-4 which encode amino acids 2-104 were screened for mutations in the 57 unrelated 
Japanese subjects with MODY described above. The sequences were identical to one another except for 
an A G polymorphism located in the 3'-untranslated region (Table 15), the frequency of which was not 

5 different between MODY and nondiabetic subjects. Thus, mutations in DCoH do not appear to contribute 
to the development of MODY in Japanese. 

The frequency of HNF-1 3 mutations in the inventors' study population of Japanese subjects with 
MODY is 2% (1/57) which is the same as for mutations in HNF-4a (Furuta et al., 1997) whereas the 
frequency of HNF-la mutations is about 8% llwasaki, ei aL 1997) (the frequency of glucokinase 

10 mutations in this sample is unknown). However, genetic variation in HNF-ip or DCoH is unlikely to be a 
major factor contributing to the more common late-onset NIDDM as there is no evidence for linkage of 
markers adjacent to these genes with diabetes in Japanese or Mexican American affected sib pairs 
(Hanise/a/.. 1996). 

The association of a mutation in HNF-ip with diabetes indicates the importance of the HNF- 
1 5 regulatory network in determining pancreatic-cell function. Moreover, HNF- 1 a is not able to compensate 
for the reduction in HNF-ip activity implying that the primary target genes for these transcription factors 
in pancreatic p-cells are different. The identification of these target genes will provide a better 
understanding of the molecular mechanisms that determine normal-cell function and may lead to new 

approaches for treating diabetes. 

20 EXAMPLES 

Elucidation of the Genes Responsible for Additional MODY Disease States 
The inventors have identified that various MODY type diabetes disease states are caused by 
mutations in various HNF proteins in the diseased individuals. However, the inventors are also aware of 
families that exhibit classic "MODY" disease states that are not caused by mutations in HNFIa, HNFip, 

25 or HNF4a. Therefore, one aspect of this invention is to continue to screen the genetic complement of 
these families to determine the genes that cause these additional MODY disease states. Such screening 
can he done in the manner successfully used by the inventors to screen for the causes of M0DY1, 
M0DY2, and MODY 3. One of ordinary skill will be able and motivated in view of the teachings of this 
application, to work towards elucidating genes that, when mutated, cause additional MODY disease 
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States. Once such genes are elucidated, all aspects diagnostic, treatment, and other aspects of the 
invention will be realizable by those of skill in the art for those additional MODY causations. In order to 
achieve these aspects of the invention, one will simply have to modify procedures and protocols taught in 
this specification to be appropriate to the specific gene determined to cause a MODY disease state. 

♦ ♦ ♦ 

All of the compositions and/or methods disclosed and claimed herein can be made and executed 
without undue experimentation in light of the present disclosure. While the compositions and methods of 
this invention have been described in terms of preferred embodiments, it will be apparent to those of skill 
in the art that variations may be applied to the compositions and/or methods and in the steps or in the 
sequence of steps of the method described herein without departing from the concept, spirit and scope of 
the invention. More specifically, ,t will be apparent that certain agents which are both chemically and 
physiologically related may be substituted for the agents described herem while the same or similar 
results would be achieved. All such similar substitutes and modifications apparent to those skilled in the 
art are deemed to be within the spirit, scope and concept of the invention as defined by the appended 
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SEQUENCE LISTING 



GENERAL INFORMATION : 

(a) APPLICANT: 

(A) NA^^E: ARCH DE'v^ELOPMENT CORPORA-ION 

(B) STREET: 1101 EAST 58TH 
iC) CITY : CHICAGO 

fO) STATE: XL 

(E) COUNTRY: US 

(F) POSTAL CODE (ZIP) : 60637 

(G) TELEPHONE: (512) 418-3000 

(H) TELEFAX: (713) 789-2675 

(A) NAME: Graeme I. Bell 

(B) STREET: Unknown 

(C) CITY: Chicago 
fD) STATE: XL 

(E) COUllTRY : USA 

(F) POSTAL CODE (ZIP): Unknovsm 

(A) NAME: Kazay^^ Yamagata 

(B) STREET: Unknown 

(C) CITY: Chicago 

(D) STATE: XL 

(E) COUTCTRY: USA 

(F) POSTAL CODE (ZIP) : Unknown 

(A) NAME: Naohisha Oda 

(B) STREET: Unknown 

(C) CITY: Chicago 

(D) STATE: XL 

(E) COVNTP.Y: USA 

(F) POSTAL CODE (ZIP) : Unknown 

(A) NAME: Pamela J. Kaisaki 

(B) STREET: Unknown 

(C) CITY: Chicago 

(D) STATE: IL 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP) : Unknown 

(A) NAME: Hiroto Furuta 

(B) STREET; Unknown 

(C) CITY: Chicago 

(D) STATE: IL 

(E) COU^rTRY: USA 

(F) POSTAL CODE (ZIP) : Unknown 

(A) NAME: Stephen Menzel 

(B) STREET: Unknown 

(C) CITY: Chicago 
fD) STATE: IL 

(E) COUNTRY; USA 
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(F) POSTAL CODE (ZIP) = Unknown 

GENES HEPATOCYTE NUCLEAR FACTOR (HNP ) 
and HNF-4ALPHA 

(111) NUMBER OF SEQUENCES: 147 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING f f " /^^^^^^r.fo , Vers.cn #1.30 (EPO) 

(D) SOFTWARE: Patentin Release hx-u, 

(VI) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US Unknown 

(B) FILING DATE: 09-SEP-1996 

^ pprnv APPLICATION DATA: 

APPLICATION NUMBER: US 60/029,6^9 
(B) FILING DATE: 30-OCT-1996 

(-.r, 1 PRIOR APPLICATION DATA: 

' ^A? APPLICATION NUMBER: US 60/028,056 

(B) FILING DATE: 02-OCT-1996 

(VI) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 60/025,719 

(B) FILING DATE: lO-SEP-1996 

(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 3238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(IX) FEATURE: ^. ^ ^ ^ 

(A) NAME /KEY: modif ied_base 

(B) LOCATION: 988 ^^rszri, 
(D) OTHER INFORMATION :/mod_base= OTHER 

/note= ^N = A, C, G, or T" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION:3oin{24..9e6, 990.. 1916) 
(xa) SEQUENCE DESCRIPTION: SEQ ID NO : 1: 

C..3=CCCTO .^..O.C.. .CC ..3 O.. TC. - CTC C.O 



1 5 



- - z v:. s s z zi IT. s ^ Z ^, IT. =s 



50 
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:TG A7C CAG GCA CTG GGT GAG CGG GGG CCG TAG CTG CTG 

-Iv Pro Tyr Le-^ Leu Ala Gly G.; 



Leu ..e Gin Ala Leu Gly Glu Pro Glv =-o ""r" "^-^ 



CTG CTG GAG AAG GGG GAG TCG Tr;- r---- , — ^--^ ^ 

.a. .y. c:, ... se.- ll.; c^^A .^.^ === 



55 



GCT GAG CTG CCC AA*^ /-^^ r^-nr- r-,^^ „ 

^^"^ GAG ACT CGG GG^ -".^^ r--" nr.^ r-^^ 

=1. Le„ P„ .... 

70 

^ Z Z ^ Z S - 1- - 

85 



c'L' r.'. X -C CO ;^ CCC 3.C CTO CO 

9C ""^^ ^-^^ Lys Ala Val Val Glu 

ACC CTT CTG CAG GAG GAC CCG TG- rn-r n-n r-^~ 

r.r ... ... - - ..c 

lI"' ^""'^ ^""'^ ^''^ =AG GTG GTC GAT ACC AC- 

Tyr .e. .1. c „,3 Asn He Pro Gin Arg Glu Val Val Tsl T^r rH^ 

130 

GGC CTC AAC CAG TCC CAC CTG TCC CAA CAC 

Gly Leu Asn Gin Ser H^s Leu Se^ u r" '^'^ ''^^ 

140 ^^■'^ Gly Thr Pro 

-'^^ 150 

ATG AAG ACG CAG AAG CGG GCC GCC rrr Tur- r.n^ 

.V. r.. ... - - - - - c.. 

Ar^ g'?u l^n G^n 

170 ""^^ Ala Gly Gin Gly Gly Leu 

' 185 

nl G^^ Gf P^' ^^.^ ^-^^ -^^^ ^'^^ AAG GGG CGG AGG 

Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Sg 

200 

Ts' Trl ir T"" ""^^ ''^^ ^AG ATC CTG TTC CAG GCC 

AS.. Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Ala 

210 215 

Tvr ^f' f ^AG CGA GAG ACG CTA GTG 
Tyr .lu Arg G.n Lys Asn Pro Ser Lys Glu Glu Arg Glu vI: 

230 

Tlu rf A""^ ^""'^ ^""^ ''^^ '^^^ ^AG AGA GGG GTG ^C^ CCA TCA 

Clu G u Cys A3n Arg Ala Glu Cys He Gin Arg Gly Val Se.: Pro Lr 
" 240 245 



242 



290 



336 



386 



434 



4 82 



530 



576 



626 



674 



722 



770 
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CAO OC. CAG GCC CTO CCC TCC AAC CTC CTC ACC CAO CTG CGT CTC TAC 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Vax Thr Glu V.l Arg Val Tyr 
250 255 260 

Z VZ S Z Z Z IT. X S 7.1 z z ^= s= 

zi zi ?s z 3- s -I s I- 



285 



290 



..3 CTO CCC .CT C.C .OC TCC CCT 03C CT= CCT CC. CCT =CC CTC TCC 

Ala Leu Pro Ala His Ser Ser Pro Gly Leu Pro Pro f 
300 305 

I- Z Z ^ S - i 3- -I 

!!! 2^ z ^ z ^5 iz '^i s ;i= vi? I- 



330 



335 



340 



CCC CTC CC C« =T= TCC CCC .CC 0=C CTC C.C CCC .CC C.C .CC 

Thr Pro Leu His Gin Val Ser Pro Thr Gly Leu Giu Fr 
345 350 355 

CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC CTC 
Leu Leu ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 



365 



370 



CCC CCT CTC .CC .CC CTC »C. CC. CTC C.C .OC TTC C.C C.C .C. TCC 

Pro Pro val Ser Thr Leu Thr Ala Leu His &er i.eu 
380 385 

CCA OGC CTC AAC CAG CAG CCC CAG AAC CTC ATC ATG GCC TCA CTT CCT 
Pro Gly Leu Asn Gin Gin Pro Gin Asn Leu He Met Ala Ser Leu 
395 400 

GGG GTC A-G ACC ATC GGG CCT GGT GAG CCT GCC TCC CTG GGT CCT ACG 
ITy ;il rll Zle Gly Pro Gly Glu Pro Ala Ser Leu Gly Pro Thr 



410 



415 



TTC ACC AAC ACA GGT GCC TCC ACC CTG GTC ATC GGC CTG GCC TCC ACG 
Phe tZ Tsn Thr Gly Ala Ser Thr Leu Val lie Gly Leu Ala Ser Thr 



425 



430 



435 



«= OC^ C^C .GT GTC CCG GTC .TC «C AGC ATG GGC .GC AGC CTG ACC 

Gin Ala Gin Ser Val Pro Val He Asn Ser Met Gly Ser Leu 



44S 



^CC CTG CAG CCC GTC CAG TTC TCC CAG CCG CTG CAC CCC TCC TAC CAG 
Thr Lu Gin pro Val Gin Phe Ser Gin Pro Leu Hxs Pro Ser Tyr Gin 



460 465 



818 



866 



914 



96: 



1010 



1058 



1106 



1154 



1202 



1250 



1298 



1346 



1394 



1442 
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CAG CCG CTC AT3 CCA OCT GT3 CAG AGC -A- r— - 

-xn Pre Leu Met Pro Pro Val Glr. Se>- h^s vl^ '"'^ '''^^ -~" 

1^5 Va. .nr G.n Ser Pro .-^ne 

•IS 5 

ATG GCC ACC ATG G"- CAG - - r- ^ 

... ... ... . 3„ - - C.C 

5 00 

AAG CCC GAG r--- 

^-.^ CAG TA^ AC- ^T- -r-" ^ 

'VQ D^-- - n . /^^^ v.rtC ACG GGC CT^ r—r- 

-ys o.u Val Axa Gin Tvr Thr H< ^ To ""^ 

^-^^ Tnr Gly Leu Leu Pro Gin Thr 

520 

ATG crc ATC ACC GAC ACC ACC AA" CTr iirr- r-r. 

Me. Leu lie Thr Asp Tnr Tnr ^tn L^u Ser Ala 17 I'' 

525 ^ ^^'^ Ser Leu Thr 



"° 535 



CCC ACC AAG CAG GTC TTC A-r- ^n- r-.^ 

Pro Thr Lys Gin Val ^he T^l sJr f f '^'^^ ^ ^^"^ 

540 ^^'^ ^l'-^ Ala Ser Ser 



545 ' 

550 



s° .^i: ."^ i;j - ?f 

555 3," ""-^ ^hr ^eu His Val Pro Ser 

56 5 



"° ir. ^ IT: s.^ fi= 

570 5_ ""'^ Ala Hxs Arg Leu Ser 

580 

GCC AGC CCC ACA GTG TCr -r— nr^^ 

Ala ser Pro Tnr Val 11^ lit f ^ 5^= ^"^^ "^AC CAG AGC TCA 

590 ^^'-^ -^^^ ser Ser 

-I s :: ^ ^ - 



615 



GAC TCC AGC AAT G 

- k_ 

ly Gin Ser His Leu 
°5 610 
ATC GAG ACC TTC A^- TP-- ar^- on.. . 

... 3. ... - - - - .cc .cc 

625 

T«c„..cc .cc.=.=ccc .=o«=cxc. .„.cc..cx Ta=c=cc.r......„,, 

AOCC^CCCr CC,=..CC.3 CC3«C«CC C.^^CCCXC CX3C.C..C. 

=T=ccxc.cx CCCC.CXCXO cxcx=™« xc^^aoc ..o=cxcx« .=c=cccc« 
ccccxca.=3 cx=cxc3.== x=c.c.=... .,3=xcox3. .«.cx.3a. =c«.accx3 

XXC.X0CC.3 .X=X.CO.C. C.CXCXCCCX OCXXC.X==. .X.C..XCXX cxx.cxx.« 

.cxa«=3.= =c==ccx.x. .cxx===c.c cccc.accxo .cccx.xco. o«cccx=cc 

ACC3CX.C.C C.CXCX=0„ .C„„CXXC X„==,„C. CCCCX.X3X. CCXC^^.C^ 
-XC.CCX.X ..0..3CCCX ==„C«.=X C3CCXX3X.C x.xc.c^x cx.cc«cc= 
-CC.CXC.X XCCX3...C. .CXC^CC. =CX.OX=.CC C.C.X3CC.X xx.x..x3.e 



i5e 



ACT GAG TCC i68; 



1"3 0 
1778 
1626 
187^ 
1916 

1976 

2036 

2096 

2156 

2216 

2276 

2336 

2396 

2456 
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CCCATCACCT ACTCACACAG GCATT.'CCTG GGTGGCTACT CTGTGCCAGA GCCTGGGGCT 
CTAACTGCCT GAGCCCAGGG AGOCCGAAGC TAACAGGGAA GGCAGGCAGG GCTCTCCTGG 
TCTTCCCATC CCCAGCGATT CCCTCTCCCA GGCCCCATGA CCTCCAGCTT TCCTGTATTT 
CTTCCCAAGA GCATGATGCC TCTGAGGCCA GCCTGGCCTC CTGCCTCTAC TGGGAAGGCT 
ACTTCGGGGC TGGGAAGTCG TCCTTACTCC TGTGGGAGCC TCGCAACCCG TGCCAAGTCC 
AGGTCCTGGT GGGGCAGCTC CTCTGTCTCG AGCGCCCTGC AGACCCTGCC CTTGTTTGGG 
GCAGGAGTAG CTGAGCTCAC AAGGCAGCAA GGCCCGAGCA GCTGAGCAGG GCCGGGGAAC 
TGGCCAAGCT GAGGTGCCCA GGAGAAGAAA GAGGTGACCC CAGOGCACAG GAGCTACCTG 
TGTGGACAGG ACTAACACTC AGAAGCCTGG GTGCCTGGCT GGCTGAGGGC AGTTCGCAGC 
CACCCTGAGG AGTCTGAGGT CCTGAGCACT GCCAGGAGGG ACAAAGGAGC CTGTGAACCC 
AGGACAAGCA TGGTCCCACA TCCCTGGGCC TGCTGCTGAG AACCTGGCCT TCAGTGTACC 
GCGTCTACCC TGGGATTCAG GAAAAGGCCT GGGGTGACCC GGCACCCCCT GCAGCTTGTA 
• GCCAGCCGGG GCGAGTGGCA CGTTTATTTA ACTTTTAGTA AAGTCAAGGA GAAATGCGGT 



GG 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) L.ENGTK: 630 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 2: 



o^v- rir. T Gin Thr Glu Leu Leu Ala Ala Leu 
Met Val Ser Lys Leu Ser Gin Leu ^xn ilil 

i _ 5 10 

Leu Glu ser Gly Leu Ser Lys Glu Ala Leu He Gin Ala Leu Gly Glu 

20 

pro Gly pro Tyr Leu Leu Ala Gly Glu Gly Pro Leu Asp Lys Gly Glu 
35 

ser CVS Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 

SO 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
65 

Phe Thr Pro Pro He Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 



251i. 

2576 

2636 

2696 

2756 

2816 

2876 

2936 

2996 

3056 

3116 

3176 

3236 

3238 



85 90 
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Ala Ala His Gir. Lys Ala Val Val Glu vhr Le^' Leu . 

^ 1 1 C' 

Trp Arg Va' Ala Lys Met Val Lys Ser Tv- Leu r^^ u 

lis - "is Asn He 

Pro Gin Arg Glu Val Val Asp Th^ r^r Glv t « . 

130 ^"^ "^^y Leu Asn Gin Ser His Leu 

14G 

Se. ^^^^^ 

15 5 

AI. .eu r... Trp Tyr Val Arg Lys Gin Arg Glu Val Ala CIn Gin 

170 

Phe Thr His Ala Gly Gin Gly Glv Leu T'e p. „ n 

180 - f^, '^^^ Thr Gly Asp 

-^^ 190 

Ol: ... P„ ... .,3 ^, ^.^^ 

^ U 0 2 0 5 

- se, =:„ 

"^^ 220 

^ys 01. ^^^^ 

2 3 5 

cy. ci. =iy v.a s,. p„ s.. ci„ 2" 

250 

255 

... ^^^^ 

V 270 

... ciu ... p,, 

285 

P« P„ 

"^^^ 300 

pro Gly Leu Pro P.o Pro Ala Leu Ser Pro Ser Lys Val Hxs Oly Val 

3 1 S 

-^^^ 320 
A., 31, al„ P.0 «a TH. 3,r Thr 31u Val P.o s=.- s„ s.- 

330 

.re ... „.i 3„ 
Th. a., ^ <3i„ 3„ 

365 

V.l =er „. 

^ ' ^ 380 
^eu H.s ser Leu Glu Gla T.r Ser Pro Gly Leu Asn Gin Oln Pro Gin 



395 



400 
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^sn Leu lie Met Ala Ser Leu Pro Gly Val Met Tar lie Gly Pro Gly 
4C5 410 4.1. 

Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser Thr 
420 425 430 

Leu val lie Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val He 



435 



440 



445 



Asn Ser Met Gly Ser Se 
450 



Ser Pro 

500 



r Leu Thr Thr Leu Gin Pro Val Gin Phe Ser 
455 460 

Gin Pro Leu Hxs Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val Gin 
465 470 475 480 

ser His val Thr Glr, Ser Pro Phe Met Ala Thr Met Ala Gin Leu Gin 
485 490 49= 

His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr Thr 

5 510 

t Leu He Thr Asp Thr Thr Asn 
520 525 

Thr Lys Gin Val Phe Thr Ser 
540 

ASP Thr Glu Ala Ser Ser Glu Ser Gly Leu Hxs Thr Pro Ala Ser Gin 
545 S50 555 ^^0 

Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly He Gin His 
565 570 575 

Leu Gin Pro Ala His Arg Le 



His Thr Gly Leu Leu Pro Gin Thr Me 
515 



Leu Ser Ala Leu Ala Ser Leu Thr Pro 

530 



535 



565 

u Ser Ala Ser Pro Thr Val Ser Ser Ser 



580 



585 



590 



ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser His 
595 600 

Leu Leu pro Ser Asn His Ser Val He Glu Thr Phe He Ser Thr Gin 
610 



615 620 



Met Ala Ser Ser Ser Gin 
625 630 



(2) INFORMATION FOR SEQ ID NO: 3: 

(1) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 3238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY : linear 



(ix) FEATURE: 

(A) NAME/KEY: modi f ied_base 
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ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATP r-- at. 

Thr Aqr> Ac:r^ /-i ^^^^ CTG 

Thr ASP Asp Asp Gly G.u Asp Phe Thr Pro Pro He Leu Lys Giu Leu 

80 g5 

GAG AAC CTC AGC CCT GAG GAG GCG GCC CAC CAG AAA GC- GTr rr- n^n 
Clu Asn Leu Ser Pro GIu Glu Ala Ala H.s Gin Si vl^ vll otu 



100 



105 



?hr t""" ''f'' "''^ ^""^ ''^^ AAG ATG GTC AAG TCC 

Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met Va^ t^s Ser 

1^1 CCA CAG CAG GAG GTG GTC GAT ACC ACT 

Tyr Leu Gin G n H.s Asn He Pro Gin Gin Glu Val Val AsJ Jhr TH^ 
125 

GGC CTC AAC CAG TCC CAC CTG TCC CAA CAC CTC AAC AAG GGC A-T CCC 
aiy Leu Asn Gin Ser H.s Leu Ser Gin H.s Leu Asn Lys Gly rUr Pro 



145 



150 



ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC T-r -^r 

"e. Lv, X... «„ III "° 

165 



50 



3 ; LCCATION : 9S 5 

OTHER IN'rOR.'^J^TION : /--.cd^base^ CTHE? 
/no-e= = a, C, g, or T " 

iix; FEATURE: 

(A) NAME /KEY: CDS 

(3) LOCATION :]Oin(24..986, 950.. 1916} 
^xi:^ SEQUENCE DESCRIPTION: SEC ID NO: 3: 

CGTGGCCCTG TGGCAGCP^^ r.'^n n'-'- r-r^rr^ 

i^CA^^C-..-. G.C A.o GiT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 

1 5 

ACG GAG CTC CTG GCG GCC r-r^ r-rn r^.^ t,^, ^ 

... t:l ^: ^ ^ - - 

15 20 25 

CTG ATC CAG GCA CTG GGT GAG CCG GGG CCC TAC CT- -Tr- '--^r r-A n.r 
-eu :ie Gin Ala Leu Gly Glu Pro Gly Pro Tyr L^I^ tly 

3 5 

5 0 

s ^ I- - s s= s =J= 

70 



290 



338 



386 



434 



482 



530 



CAG CGA GAG GTG GCG CAG CAG TTC ACC CAT CCA rr- r-nr^ 

... v.: ... o.. Pn. ™^ S^I Ty 
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ATT GAA GAG CCC ACA GOT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 
lie Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys G.y Arg Arg 



190 



195 



AAC CGT TTC AAG TOO GGC CCA GCA TCC CAG GAG ATC CTG TTC GAG GCC 
Asn Ara Phe Lys Trp Gly Pro Ala Ser Gin Gin lie Leu Phe Gin Ala 
205 210 215 

TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 
Xyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Tar Leu Val 
220 225 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 



235 



240 245 



CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAC 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Val Thr Glu Val Arg Val Tyr 
250 255 260 265 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 
Tsn Trp Phe Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg H.s Lys Leu 
270 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCA GGG CCA GGC CCG GGA CCT 
Ala mI. asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro Gly Pro 

290 295 



285 



GCG CTG CCC GCT CAC AGC TCC CCT GGC CTG CCT CCA CCT GCC CTC TCC 
Ala Leu Pro Ala His Ser Ser Pro Gly Leu Pro Pro Pro A.a Leu Ser 
300 305 310 

CCC AGT AAG GTC CAC GGT GTG CGC TNT dGA CAG CCT GCG ACC AGT GAG 
pro ser Lys Val Hxs Gly Val Arg Gly Gin Pro Ala Thr Ser Glu 

315 320 325 

ACT GCA GAA GTA CCC TCA AGC AGC GGC GGT CCC TTA GTG ACA GTG TCT 
Thr Ala Glu Val Pro Ser Ser Ser Gly Gly Pro Leu Val Thr Val Ser 
330 335 340 

ACA CCC CTC CAC CAA GTG TCC CCC ACG GGC CTG GAG CCC AGC CAC AGC 
Thr Pro Leu His Gin Val Ser Pro Thr Gly Leu Glu Pro Ser Hxs Ser 



345 



350 355 360 



CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC CTC 
leu Leu ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 

370 375 



365 



CCC -CT GTC AGC ACC CTG ACA GCA CTG CAC AGC TTG GAG CAG ACA TCC 
Pro Pro val Ser Thr Leu Tnr Ala Leu His Ser Leu Glu Gin Thr Ser 



380 



385 



CCA GGC CTC AAC CAG CAG CCC CAG AAC CTC ATC ATG GCC TCA CTT CCT 
Pro Gly Leu Asn Gin Gin Pro Gin Asn Leu lie Met Ala Ser Leu Pro 
395 400 405 



626 



674 



722 



770 



818 



866 



914 



962 



1010 



1058 



1106 



1154 



1202 



1250 
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GGG GTC ATG ACC ATC GG3 CC 



■oo. GAG CCT G::C TCC G7G GGT AC^, ^-o- 

* .(^^ A.i^ -^xe ^jXV r^rc 
^ 415 



Gly Vai Met Thr lie Gl- ^ • 

^x, ^^^^ p^^ ^^^^ ^^.^ 



4 

TTC ACC AAC ACA GGT GCC TCC ACC CTG GTC ATC G 
425 "^^^ "^^^ '^^^ ^'"''^ ^^'^ 



Phe Thr Asn Thr Gly Ala Se- "^'^ '"^^ '^^^ ""^^^ 

^^^^ 435 

4i5 44Q 

?n a"^ r^' f ''^^ ^'"^ ACC 

Gxn A.a G.n Ser Val Pro Val He Asn Ser Met Gly Ser Ser Leu Jhr 



445 



450 



455 



465 



470 



CAC CCG CTC ATG CCA CCT GTG CAG AGC CAT GTG ACC CAG AGC CCC TTC 
Gin Pro Leu Met Pro Pro Val Gin Ser H.s Val Thr Gin Ser P^o 



480 



485 



AAG CCC GAG GTG GCC CAG TAC ACC CAC ACG GGC CTG CTC CCG CAG ACT 
^ys Pro G.u val Ala Gin Tyr Thr H.s Thr Gly Leu Leu Pro Clr ^hr 



515 



520 



620 625 



630 



TAACCACGGC ACCTGGGCCC TGGGGCCTGT ACTGCCTGCT TGGGGGGTGA TGAGGGCAGC 



l394 



ACC CTG CAG CCC GTC CAG TTC TCC CAG CCG CTG CAC CCC TC- TAC CAG 
Thr Leu Gin Pro Val Gin Phe Ser Gin Pro Leu H.s Pro Sel Oln 
460 J ~ 



1490 



ATG GCC ACC ATG GCT CAG CTG CAG AGC CCC CAC GCC CTC -AC AGC CAC 
Met A a Tnr Met Ala Oln Leu Gin Ser Pro H.s Ala Leu T^r Ser U^s 



1586 



1634 



ATG C.C ATC ACC GAC ACC ACC AAC CTG AGC GCC CTG GCC AGC CTC ACG 
Met Leu He Thr Asp Thr Thr Asn Leu Ser Ala Leu Ala Ser lIu Jhr 

530 

CCC ACC AAG CAG GTC TTC ACC TCA GAC ACT GAG GCC TCC AGT GAG T-r 
Pro Thr Lys Gin Val Phe Thr Ser Asp Thr Glu Ala Ser Ser Oil sfr 
540 

GGG CTT CAC ACG CCG GCA TCT CAG GCC ACC ACC CTC CAC GTC CCC AGC 
Gly Leu H.s Thr Pro Ala Ser Gin Ala Thr Thr Leu Hxs Val Pro Ser 

560 

CAG GAC CCT GCC GGC ATC CAG CAC CTG CAG CCG GCC CAC CGG CTC AGC 
Gin A.P Pro Ala Gly He Gin H.s Leu Gin Pro Ala His Arg Leu Ser 

575 580 

GCC AGC CCC ACA GTG TCC TCC AGC AGC CTG GTG CTG TAC CAG AGC TCA 
A.a ser Pro Thr Val Ser Ser Ser Ser Leu Val Leu Tyr Gin Ser Ser 

GAC TCC AGC AAT GGC CAG AGC CAC CTG CTG CCA TCC AAC CAC AGC GTC 
Asp Ser Ser Asn Gly Gin Ser Hxs Leu Leu Pro Ser Asn Hxs s»r Va^ 
^^^5 610 ei5 

ATC GAG ACC TTC ATC TCC ACC CAG ATG GCC TCT TC^ TC" CAG 

lie Glu Thr Phe He Ser Thr Gin Met Ala Ser Ser Ser Gin ^^^^ 



1682 



1.730 



1778 



1826 



1874 



1976 
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AGCCAGCCCT 


GCCTGGAGGA 


CCTGAGCCTG 


CCGAGCAACC 


GTGGCC:^ 1 


CTGGACAGCT 


2036 


GTGCCTCGCT 


CCCCACTCTG 


CTCTGATGCA 


TCAGAAAGGG 


AGGGC * L i L>A 


\J ^ ^ ^ — ^ J M • 


2096 


CCCGTGGAGG 


CTGCTCGGGG 


TGCACAGGAG 


GGGGTCGTGG 


AGAG w i AoUM 


GCAAAGCCTG 


2156 


TTCATGGCAG 


ATGTAGGAGG 


GACTGTCGCT 


GCTTCGTGGG 


ATACAGTCTT 


r TT A PTTGG A 


2216 


ACTGAAGGGG 


GCGGCCTATG 


ACTTGGGCAC 


CCCCAGCCTG 


GGCCTATGGA 


ruczrcCTGGG 


2276 


ACCGCTACAC 


CACTCTGGCA 


GCCACACTTC 


TCAGGACACA 


GGCCTGTGTA 


npT^nTGACCT 


2336 


GCTGAGCTCT 


GAGAGGCCCT 


GGATCAGCGT 


GGCCTTGTTC 


TGTCACCAAT 


rTAPC^CACCG 


2396 


GGCCACTCCT 


TCCTGCCCCA 


ACTCCTTCCA 


GCTAGTGACC 


CACATGCCAT 


TTGTACTGAC 


2456 


CCCATCACCT 


ACTCACACAG 


GCATTTCCTG 


GGTGGCTACT 


CTGTGCCAGA 


GCCTGGGGCT 


2516 



CTA*CT<3CCT GAOCCCA=G= AGOCCGAAGC: TAACAGSGM GicAGG.AGG GCTCTCCTGG 
XCTTCCCATC CCCAGCGATT CCCTCTCCCA GGCGCCATG. CGTCCAGCTT TCCTGTA^T 
CTTCCCAAGA GCATGATGCC TCTGAGGCCA GCCTGGCCTC CTGCCTCTAC TGGGAAGGCT 
ACTTCGGOGC TGG3AAGTCG TCCTTACTCC TGTGGGAGCC TCGCAACCGG TGCCAAGTCC 
AGGTCGTGGT GGGGCAGCTC .TGTGTCTCG AGCGCCCTOC AGACCC.GCC CTTGTTTGGG 
GCAGGAG.AG CTGAGCTCAC AAGGCAGCAA GGCGCGAGCA GCTGAGCAGG GGCOGGGAAC 
TGGCCAAGCT GAGGTGCCCA GGAGAAOAAA GAGGTGACCC CAGGGCAGAG GAGCTACCTG 
TGTGGACAGG ACTAACACTC AGAAGCCTGG GTGCCTGGCT GGCTGAGOGC AGTTCGCAGC 
GACCCTGAGG AGTCTGAGGT CCTGAGCACT GCCAGGAGGG AGAAAGGAGC CTGTGAACCC 
AGGACAAGCA TGGTCCCACA .CGCTGGGCG TGCTGCTGAG AACCTGGCG. TCAGTGTACC 
GCGTCTACCC TGGGATTCAG OAAAAGGCCT GGGGTGACCC OGCACCCCCT GGAGGTTGTA 
GCCAGCCGGG GCGAGTGGCA CGTTTATTTA ACTTTTAGTA AAGTCAAGGA GAAATGGGGT 



GG 



(2) INFORMATION FOR SEQ ID NO: 4; 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 630 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(11) KOLECtTLE TYPE: protein 

(xi; SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



2576 

2636 

2696 

2756 

2816 

2876 

2936 

2996 

3056 

3116 

3176 

3236 

3238 
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Leu Gl. Ser Oly Leu Ser Lys Glu Al. Leu I.e Gin AI. Le 



Gl- 

0 

^ ' 30 



Pro Gly Pre Tvr Leu Leu a r^v r-^ 

3 5 ^ ^-o Leu Asp ^ys G^y Giu 

4 5 

Ser Cys Glv Gx v v 

Leu Pro Asn Gly Le; 
60 



'^rq G^y Glu Leu Ala Glu L 



Gly Glu Thr Ar- 

' ^'-^ Glu Asc 



-t) Gly Ser Glu Asp Glu 
70 

80 

Phe Thr Pro Pro 11^ 



Leu Lys Glu Leu Glu 



65 Ser Pro Glu Glu 

Ala Ala H.s Gin Lys Ala Val Val Glu Thr Leu Leu Gin ^- . 

100 ^-^^ Asp Pro 

110 

Trp Arg Val Ala Lys Met Val Lys s^r t--^ - 

115 ^^^^ --n Gin His Asn He 

125 

Pro Cl„ =1„ .1, 

140 

»r Cl„ H„ ... .3„ =1, 

15 5 

Ala .eu ry. XH. Trp Tyr Val A., Lys Gin Arg Glu Val Ala Gin oin 

■^"^^ 175 



... H.. ^ 

190 

Glu Leu Pro Thr Lys Lys Gly Ara Ara Asr Arc Ph. t ^ 

195 -,„r ' Phe Lys Trp Gly Pro 

205 

Ala 01„ =1„ I,, 

^-^^ 220 
ser Lys Glu Glu Arg Olu T.r Leu Val Glu Glu Cys Asn Arg Ala Glu 

240 

cys He Gin Arg Gly Val Ser Pro Ser Gin Ala Gin 

230 



— ... ocr ^.n Ala Gin Gly Leu Glv Ser 

255 

Asn Leu Val Thr Glu Val Arg Val Ty. as-^ Tn. pv,. ^i . 

26 0 265 " ^""3 



270 



.X. =1. =1. p,, ^^^^ 

285 

P.C P.O p„ 

^ 300 
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305 310 



315 320 



Arg Gly Gin Pro Ala Thr Ser Glu Thr Ala Glu Val Pro Ser Ser Ser 
32S 330 335 

Gly Gly Pro Leu Val Thr Val Ser Thr Pro Leu His Gin Val Ser Pro 
340 345 350 

Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys Leu 
355 360 365 

Val Ser Ala Ala Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr Ala 
370 375 380 

Leu His Ser Leu Glu Gin Thr Ser Pro Gly Leu Asn Gin Gin Pro Gin 
385 390 395 400 

Asn Leu lie Mer Ala Ser Leu Pro Gly Val Met Thr He Gly Pro Gly 
405 410 415 

Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser Thr 
420 425 430 

Leu Val He Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val He 
435 440 445 

Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gin Pro Val Gin Phe Ser 
450 455 460 

Gin Pro Leu His Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val Gin 
465 470 475 480 

Ser His Val Thr Gin Ser Pro Phe Met Ala Thr Met Ala Gin Leu Gin 
485 490 495 

Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr Thr 
500 505 SIO 

His Thr Gly Leu Leu Pro Gin Thr Met Leu He Thr Asp Thr Thr Asn 
515 520 525 

Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gin Val Phe Thr Ser 
530 535 540 

Asp Thr Glu Ala Ser Ser Glu Ser Gly Leu His Thr Pro Ala Ser Gin 
545 550 555 560 

Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly He Gin His 
565 570 575 

Leu Gin Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser Ser 
580 585 590 

Ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser His 
595 600 605 

Leu Leu Pro Ser Asn His Ser Val He Glu Thr Phe He Ser Thr Gin 
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615 



Me:: Ala Ser Ser Ser Gin 

630 



(2; INFORMATION* FOR SEQ ID NO : 5- 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3239 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ix) FEATURE: 

(A) NAME/KEY: modi f ied_base 
(3) LOCATION: 98 9 

(D) OTHER INFORMATION :/mod_base= OTHER 
/note= "N = A, C, G, or T" 

( IX ) FEATURE : 

(A) NAME/KEY; CDS 

(B) LOCATION: 24 . . 96 5 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Cn 
1 5 

?hr ''f^ "^"■'^ ^^^^ AAA GAG GCA 

Thr G.. .eu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser Lys Clu Til 

20 25 

CTG ATC CAG GCA CTG GGT GAG CCG GGG CCC TAG CT^ CTP pr- n-r. 
I..U II. 31„ ^= aA. 



35 



40 



G?v pf ""^^ GGG GAG CTG 

Gly Pro Leu Asp Lys Gly Glu Ser Cys Gly Gly Giy Arg Gly Glu lIu 



50 



55 



GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC CAP 
Ala Glu Leu Pro Asn Gly Leu Gly GI. T.r Arg Gly Ser G^u A^p G^u 

70 

ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATC CTC AAA GA- CTC 
Thr ASP ASP ASP Gly Glu Asp Phe Thr Pro Pro He lIu oiZ Je 

80 85 ' ^ 

tlu Tf^ T"" "^'^ '^'^^ ^"^^ ^ c;tg gtg gag 

Glu Asn .eu Ser Pro Glu Glu Ala Ala His Gin Lys Ala Val Val Gl! 

Ttr lIv T"" ''^^ ^^"^ ^-^^ GTC AAG T-C 

Thr Leu Leu Gin Glu Asp Pro Trp Ar. Val Ala Lys Met Val Lys Ser 

120 



50 



98 



146 



194 



24: 



290 



338 



3B6 
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TAC CTG CAG CAG CAC AAC ATC CCA CAG CG-3 GAG GTG GTC GAT ACC ACT 4 34 

Tyr Leu Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Tnr Thr 
125 130 

GGC CTC AAC CAG TCC CAC CTG TCC CAA CAC CTC AAC AAG GGC ACT CCC 
Gly Leu Asr. Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pre 
140 145 ISO 

ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC TGG TAC GTC CGC AAG 
Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys 
155 160 165 

CAG CGA GAG GTG GCG CAG CAG TTC ACC CAT GCA GGG CAG GGA GGG CTG 
Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly Gly Leu 
170 175 IBO 185 

ATT GAA GAG CCC ACA GGT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 
He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Arg Arg 
190 155 200 

AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Aia 
205 210 215 

TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 
Tyr Glu Arg Gin Lys Asn Pro Ser Lys Glu Glu Arg Glu Thr Leu Val 
220 225 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 
235 240 245 

CAG GCA CAG GGG CTG GGC TCC AAC CTC GTC ACG GAG GTG CGT GTC TAC 
Gin Ala Gin Gly Leu Gly Ser Asn Leu Val Thr Glu Val Arg Val Tyr 
250 255 260 265 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CAC AAG CTG 
Asn Trp Phe Ala Asn Arg Arg Lys Glu Glu Ala Phe Arg His Lys Leu 
2-70 275 280 

GCC ATG GAC ACG TAC AGC GGG CCC CCC CCC AGG GCC AGG CCC GGG ACC 
Ala Met Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro Gly Thr 
285 290 295 

TGC GCT GCC CGC TCA CAG CTC CCC TGG CCT GCC TCC ACC TGC CCT CTC 962 
Cys Ala Ala Arg Ser Gin Leu Pro Trp Pro Ala Ser Thr Cys Pro Leu 
300 305 310 

CCC CAGTAAGGTC CACGGTGTGC GCTNTGGACA GCCTGCGACC AGTGAGACTG 1015 
Pro 

CAGAAGTACC CTCAAGCAGC GGCGGTCCCT TAGTGACAGT GTCTACACCC CTCCACCAAG 107 5 

TGTCCCCCAC GGGCCTGGAG CCCAGCCACA GCCT3CTGAG TACAGAAGCC AAGCTGGTCT 1135 



482 



530 



578 



626 



674 



722 



770 



B18 



866 



914 
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C...o_>. -^-^-.T^^r AGCACTGCAC AGCTTGGASr ll? 

AGAGATGGCG AGGCGTCAAC CAGCAGCCCC AGAACGTCAT C^TGZCCrCA CTTCCTGGGG 12^ 
TCATGACCAT CGGGCGTGGT GAGCCTGCCZ CCCTGGGTGG TAGGTTCACC AACAGAGGTC 
CCTCCACCCT GGTCATCGGC CTGGCCTCGA CGCAGGCACA GAGTGTGCCG GTCATCAACA 
GCATGGGCAG CAGCCTGACC ACGCTGCAGC CCGTCCAGTT CTCCCAGCCG CTGCACGCCT 
CCTACCAGCA GCCGCTCATG CGACCTGTGC AGAGCGATGT GACCCAGAGC CCCTTCATGG 
CCACCATGGC TGAGCTGCAG AGCCCCGACG CCGTCTACAG CCACAAGCCC GAGGTGGCCC 
AGTACACCCA CACGGGCCTG CTCCCGCAGA CTATGCTCAT CACCGACACG ACCAACGTGA 
GCGCCCTGGC CAGCCTCACG CCCACCAAGC AGGTCTTCAC CTCAGACACT GAGGCGTCCA 
GTGAGTCCGG GCTTCACACG CCGGCATCTC AGGCCACCAC CCTCCACGTC CCCAGCCAGG 
ACCCTGCCGG CATCCAGCAC CTGCAGGGGG CGCACCGGCT CAGCGCCAGG GCCACAGTGT 
CCTCCAGCAG CCTGGTGGTG TACCAGAGCT CAGACTCCAG CAATGGCCAG AGCGACCTGC 
TGCCATCCAA CCACAGCGTC ATCGAGACCT TCATCTCCAG CCAGATGGCG TCTTCCTGCC 
AGTAACCAGG GGACCTGGGC CCTGGGGCCT GTACTGCCTG CTTGGGGGGT GATGAGGGCA 
GCAGCCAGCC CTGCCTGGAG GACCTGAGCG TGCCGAGCAA CCGTGGCCCT TCCTGGACAG 
CTGTGGCTCG CTCCGCACTC TGCTCTGATG CATCAGAAAG GGAGGGCTCT GAGGCGCGCC 
AACCCGTGGA GGCTGCTCGG GGTGCACAGG AGGGGGTGGT GGAGAGCTAG GAGCAAAGCC 
TGTTCATGGC AGATGTAGGA ' GGGACTGTCG CTGCTTCGTG GGATACAGTC TTCTTACTTG 
GAACTGAAGG GGGCGGCCTA TGACTTGGGC ACCCCCAGCC TGGGCCTATG GAGAGCCCTG 
GGACCGCTAC ACCACTCTGG CAGCCACACT TCTCAGGACA CAGGCCTGTG TAGCTGTGAC 
CTGCTGAGCT CTGAGAGGCC CTGGATCAGC GTGGCCTTGT TCTGTCACCA ATGTACGCAC 
CGGGCCACTC CTTCCTGCCC CAACTCCTTC CAGCTAGTGA CCCACATGCC ATTTGTACTG 
ACCCCATCAC CTACTCACAC AGGCATTTCG TGGGTGGCTA CTCTGTGCCA GAGCCTGGGG 
CTCTAACTGC CTGAGCCCAG GGAGGCCGAA GCTAACAGGG AAGGCAGGCA GGGCTCTCCT 
GGTCTTCCCA TCCCCAGCGA TTCCCTCTCC CAGGCCCCAT GACCTCGAGC TTTCCTGTAT 
TTCTTCCCAA GAGCATGATG CCTCTGAGGC CAGCCTGGCC TCCTGCCTCT ACTGGGAAGG 
CTACTTCGGG GCTGGGAAGT CGTCCTTACT CCTGTGGGAG CCTCGCAACC CGTGCCAAGT 
CCAGGTCCTG GTGGGGCAGC TCCTCTGTCT CGAGCGGCCT GCAGACCCTG GCCTTGTTTG 
GGGCAGGAGT AGGTGAGGTC ACAAGGCAGG AAGGCCGGAG CAGGTGAGGA GGGCCGGGGA 



1315 
1375 
1435 
1495 
1555 
1615 
1675 
1735 
1795 
1855 
1915 
1975 
2035 
2095 
2155 
2215 
2275 
2335 
2395 
2455 
2515 
2575 
2635 
2695 
2755 
2815 
2875 



BNSDOCID <WO 9ei1254A1_l_> 



PCTAJS97/16037 

WO 98/11254 

175 



ACTGGCCAAG 


CTGy-.GGTGCC 


CAGGAGAAGA 


AAGAGGTGAC 


CCCAGGGCAC 


AGGAGCTACC 


2 9 3 5 


TGTGTGGACA 


GGACTAACAC 


TCAGAAGCCT 


GGGTGCCTGG 


CTGGCTGAGG 


GCAGTTCGCA 


2 995 


GCCACCCTGA 


GGAGTCTGAG 


GTCCTGAGCA 


CTGCCAGGAG 


GGACAAAGGA 


GCCTGTGAAC 


3055 


C L. A o La i-l L_ >vM. >J 


CATGGTCCCA 


CATCCCTGGG 


CCTGCTGCTG 


AGAACCTGGC 


CTTCAGTGTA 


3115 


CCGCGTCTAC 


CCTGGGATTC 


AGGAAAAGGC 


CTGGGGTGAC 


CCGGCACCCC 


CTGCAGCTTG 


3175 


TAGCCAGCCG 


GGGCGAGTGG 


CACGTTTATT 


TAACTTTTAG 


TAAAGTCAAG 


GAGAAATGCG 


3235 
3239 



GTGA 



(2) INFORMATION FOR SEQ ID NO: 6: 

(1) SEQUENCE CHARACTERISTICS: 
(A; LENGTH; 314 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 6: 



Le 



u Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 

5 10 

u Glu ser Gly Leu Ser Lys Glu Ala Leu He Gin Ala Leu Gly Glu 



Met Val Ser Lys Le 

5 10 



20 



25 



30 



Pro Gly 



ly Pro Tyr Leu Leu Ala Gly Glu Gly Pro Leu Asp Lys Gly Glu 



35 



40 



45 



ser Cys Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 
50 55 60 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
65 

Phe Thr Pro Pro He Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 
85 90 

Ala Ala Hxs Gin Lys Ala Val Val Glu Thr Leu Leu Gin Glu Asp Pro 
100 105 110 



Trp Arg Val Ala Lys Met 

115 120 



Val Lys Ser Tyr Leu Gin Gin His Asn He 
120 125 

Pro Gin Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 
130 135 1*0 

ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin Lys Arg Ala 
145 ISO 155 

Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 
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16 5 ^ -> ^. 

Pne Thr H.s Ala Gly Glr. Gly Gly Le-. He Gl u Gl u Pre --.^ G-- As^ 

ae5 ,sc 

Glu Pro Thr Lys Lys Gly Arg Arg Asn Ara Phe Lvs Trp v 

20^ 205 

A^a Ser Gin Gin lie L.e" ^hp ni^ t^.v- 

^^'^ ^-^^ Giu Arg G.n Lys Asn Pro 



215 



Ser Lys Glu Glu Arg Gl 



225 



g Gxu Thr Leu Val Glu Giu Cys Asn Arg Ala Glu 



230 



235 



24 0 



Cys lie Gin Arg Gly Val 



Ser Pro Ser Gin Ala Gin Gly Leu Gly Ser 

255 



2*^^ 250 



Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 

265 210 

Lys Glu Glu Ala Pne Arg His Lys Leu Ala Met Asp Thr Tvr Ser Gly 

280 



Pro Pro Pro Arg Ala Arg Pro Gly Thr 



290 



295 



Cys Ala Ala Arg Ser Gin Lei 



300 



Pro Trp Pro Ala Ser Thr Cys Pro 

310 



Leu Pro 



(2) INFORMATION FOR SEQ ID NO ; 7: 

{^) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3236 base pairs ' 

(B) TYPE: nuclexc acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

fix) FEATURE : 

(A) NAME/KEY: modif ied_base 

(B) LOCATION : 988 

(D) OTHER INFORMATION: /mod_base= OTHER 
/noce= "N = A, C, G, or T" 

■. IX) FEATURE : 

(A) NAME /KEY ; CDS 

(B) LOCATION: join (24. .986, 990. .1271) 
fxi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGTGGCCCTG TGGCAGCCGA GCC ATG GTT TCT AAA CTG AGC CAG CTG CAG 

Met Val Ser Lys Leu Ser Gin Leu Gin 
1 



?hr f '''''' ^"■'^ ^""^ ^^<= AAA GAG GCA 

Thr Glu Leu Leu Ala Ala Leu Leu Glu Ser Gly Leu Ser Lys Glu Ala 

^° 15 20 ,5 



50 



98 
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CTG ATC CAG OCA CTG GGT GAG CCG ^G CCC TAG CTC CTG GOT GGA GAA 
Leu He Gin Ala Leu Gly Glu Pro Gly Pro Tyr Leu Leu Ala Gly Glu 
30 35 40 

GGC CCC CTG GAC AAG GGG GAG TCC TGC GGC GGC GGT CGA GGG GAG CTG 
Gly Pro Leu Asd Lys Gly Glu Ser Cys Gly Gly Gly Arg Gly Glu Leu 
45 50 55 

GCT GAG CTG CCC AAT GGG CTG GGG GAG ACT CGG GGC TCC GAG GAC GAG 
Ala Glu Leu Pro Asn Gly Leu Gly Glu Thr Arg Gly Ser Glu Asp Glu 
60 65 

ACG GAC GAC GAT GGG GAA GAC TTC ACG CCA CCC ATC CTC AAA GAG CTG 
Thr Asp ASP Asp Gly Glu Asp Phe Thr Pro Pro He Leu Lys Glu Leu 



75 



80 65 



GAG AAC CTC AGC CCT GAG GAG GCG GCC CAC CAG AAA GCC GTG GTG GAG 
Glu Asn Leu Ser Pro Glu Glu Ala Ala His Gin Lys Ala Val Val Glu 



90 



95 100 



ACC CTT CTG CAG GAG GAC CCG TGG CGT GTG GCG AAG ATG GTC AAG TCC 
Thr Leu Leu Gin Glu Asp Pro Trp Arg Val Ala Lys Met Val Lys Ser 
110 

TAC C"G CAG CAG CAC AAC ATC CCA CAG CGG GAG GTG GTC GAT ACC ACT 
Tyr Leu Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Thr Thr 



125 



130 135 



GGC C-C AA^ CAG TCC CAC CTG TCC CAA CAC CTC AAC AAG GGC ACT CCC 
Gly Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro 



140 



145 150 



ATG AAG ACG CAG AAG CGG GCC GCC CTG TAC ACC TGG TAC GTC CGC AAG 
Met Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys 



155 



160 165 



CAG CGA GAG GTG GCG CAG CAG TTC ACC CAT GCA GGG CAG GGA GGG CTG 
Gin Arg Glu Val Ala Gin Gin Phe Thr His Ala Gly Gin Gly Gly Leu 
170 1-75 160 185 

ATT GAA GAG CCC ACA GGT GAT GAG CTA CCA ACC AAG AAG GGG CGG AGG 
He Glu Glu Pro Thr Gly Asp Glu Leu Pro Thr Lys Lys Gly Arg Arg 
190 195 200 

AAC CGT TTC AAG TGG GGC CCA GCA TCC CAG CAG ATC CTG TTC CAG GCC 
Asn Arg Phe Lys Trp Gly Pro Ala Ser Gin Gin He Leu Phe Gin Ala 
205 210 215 

TAT GAG AGG CAG AAG AAC CCT AGC AAG GAG GAG CGA GAG ACG CTA GTG 
Tyr Glu Arg Gin Lys Asn Pro ser Lys Glu Glu Arg Glu Thr Leu Val 
^ 220 225 230 

GAG GAG TGC AAT AGG GCG GAA TGC ATC CAG AGA GGG GTG TCC CCA TCA 
Glu Glu Cys Asn Arg Ala Glu Cys He Gin Arg Gly Val Ser Pro Ser 
235 240 245 



146 



194 



242 



290 



338 



386 



434 



482 



530 



578 



626 



674 



722 



770 
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CAG GCA CAG GGG CTG GGC TCC AAG CTC GT" A^G GAG ■ - 

(^-n ^.a G-y ^e-u ..y Ser Asn Lev. Va 1 Tl.r Glu Val Ara Val Ty- 

^ 255 ' 

AAC TGG TTT GCC AAC CGG CGC AAA GAA GAA GCC TTC CGG CA" AA- 
Asn Trp Phe Aia Asn A.g Arg Lys Glu Glu Ala Phe Ara Lvs Leu 

-70 275 



■ J - ~ TAG ei; 



28; 



GCC ATG GAC ACG TAC. AGC GGG CCC CCC CCA GGG CCA GGC CCG GGA CC 
Ala Me. Asp Thr Tyr Ser Gly Pro Pre Pro Gly Pro Gly Pro Gl^ Pro 
285 290 295 

GCG CTG CCC GCT CAC AGC TCC OCT GGC CTG CCT CCA CCT GCC CTC TCC 
Aia .eu Pro Axa His Ser Ser Pro Gly Leu Pro Pro Pro Ala Leu Ser 

300 



310 



CCC AGT AAG GTC CAC GGT GTG CGC TNT GGA CAG CCT GCG ACC AGT GAG 
Pro ser Lys Val H.s Gly Val Arg Gly Gin Pro Ala Thr Se; Clu 



Jhr Al^ rT ^""^ ''^^ ^CA GTG TC" 

Thr Ala Glu Val Pro Ser Ser Ser Gly Gly Pro Leu Val Thr Val Ser 

"° 335 

?hr P^o l""" m"^^ """^ ^""^ CAC AGC 

Thr Pro Leu H.s Gin Val Ser Pro Thr Gly Leu Glu Pro Ser H.s Ser 



350 



36 0 



365 



375 



Pro A u ^ ''''^ ^""^ '''''' ""^^ GAC ATC CCC 

Pro A.g Gin His Pro Asp Ser Thr Ala Gin Leu Gly Ala Asp He Pro 
380 385 390 

AGG CCT CAA CCA GCA GCC CCA GAA CCT CAT CAT GGC CTC ACT TCC TGC 
Arg Pro Gin Pro Ala Ala Pro Glu Pro H^s His Gly Leu Thr Ser Trp 
395 400 405 

GGT CAT GAC CAT CGG GCC TGG TGAGCCTGCC TCCCTGGGTC CTACGTTCAC 
Gly His Asp His Arg Ala Trp 
410 

CAACACAGGT GCCTCCACCC TGGTCATCGG CCTGGCCTCC ACGCAGGCAC AGAGTGTGCC 
GGTCATCAAC AGCATGGGCA GCAGCCTGAC CACCCTGCAG CCCGTCCAGT TCTCCCAGCC 
GCTGCACCCC TCCTACCAGC AGCCGCTCAT GCCACCTGTG CAGAGCCATG TGACCCAGAG 
CCCCTTCATG GCCACCATGG CTCAGCTGCA GAGCCCCCAC GCCCTCTACA GCCACAAGCC 
CGAGGTGGCC CAGTACACCC ACACGGGCCT GCTCCCGCAG ACTATGCTCA TCACCGACAC 
CACCAACCTG AGCGCCCTGG CCAGCCTCAC GCCCACCAAG CAGGTCTTCA CCTCAGACAC 
TGAGGCCTCC AGTGAGTCCG GGCTTCACAC GCC3GCATCT CAGGCCACCA CCCTCCACGT 



86 6 



514 



96: 



1010 



1C58 



1106 



CTG CTG AGT ACA GAA GCC AAG CTG GTC TCA GCA GCT GGG GGC CCC C^C 1154 
Leu Leu Ser Thr Glu Ala Lys Leu Val Ser Ala Ala Gly Gly Pro Leu 



1202 

1250 

1301 

1361 
1421 
1481 
1541 
1601 
1661 
1721 
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CCCCAoCCAG GACCCTGCCG GCATCCAGCA CCTGCAGCCG GCCCACCGGC TCAGCGCCAG 176 1 

CCCCACAGTG TCCTCCAGCA GCCTGGTGCT GTACCAGAGC TCAGACTCCA GCAATGGCCA 164 1 

GAGCCACCTG CTGCCATCCA ACCACAGCGT CATCGAGACC TTCATCTCCA CCCAGATGGC 1901 

CTCTTCCTCC CAGTAACCAC GGCACCTGGG CCCTGGGGCC TGTACTGCCT GCTTGGGGGG 1961 

TGATGAGGGC AGCAGCCAGC CCTGCCTGGA GGACCTGAGC CTGCCGAGCA ACCGTGGCCC 2 021 

TTCCTGGACA GCTGTGCCTC GCTCCCCACT CTGCTCTGAT GCATCAGAAA GGGAGGGCTC 2 061 

TGAGGCGCCC CAACCCGTGG AGGCTGCTCG GGGTGCACAG GAGGGGGTCG TGGAGAGCTA 2141 

GGAGCAAAGC CTGTTCATGG CAGATGTAGG AGGGACTGTC GCTGCTTCGT GGGATACAGT 22 01 

CTTCTTACTT GGAACTGAAG GGGGCGGCCT ATGACTTGGG CACCCCCAGC CTGGGCCTAT 22 61 

GGAGAGCCCT GGGACCGCTA CACCACTCTG GCAGCCACAC TTCTCAGGAC ACAGGCCTGT 2 321 

GTAGCTGTGA CCTGCTGAGC TCTGAGAGGC CCTGGATCAG CGTGGCCTTG TTCTGTCACC 2381 

AATGTACCCA CCGGGCCACT CCTTCCTGCC CCAACTCCTT CCAGCTAGTG ACCCACATGC 2441 

CATTTGTACT GACCCCATCA CCTACTCACa CAGGCATTTC CTGGGTGGCT ACTCTGTGCC 2 5 01 

AGAGCCTGGG GCTCTAACTG CCTGAGCCCA GGGAGGCCGA AGCTAACAGG GAAGGCAGGC 2561 

AGGGCTCTCC TGGTCTTCCC ATCCCCAGCG ATTCCCTCTC CCAGGCCCCA TGACCTCCAG 2621 

CTTTCCTGTA TTTCTTCCCA AGAGCATGAT GCCTCTGAGG CCAGCCTGGC CTCCTGCCTC 26 81 

TACTGGGAAG GCTACTTCGG GGCTGGGAAG TCGTCCTTAC TCCTGTGGGA GCCTCGCAAC 2 741 

CCGTGCCAAG TCCAGGTCCT GGTGGGGCAG CTCCTCTGTC TCGAGCGCCC TGCAGACCCT 2 8 01 

GCCCTTGTTT GGGGCAGGAG TAGCTGAGCT CACAAGGCAG CAAGGCCCGA GCAGCTGAGC 2861 

AGGGCCGGGG AACTGGCCAA GCTGAGGTGC CCAGGAGAAG AAAGAGGTGA CCCCAGGGCA 2 921 

CAGGAGCTAC CTGTGTGGAC AGGACTAACA CTCAGAAGCC TGGGTGCCTG GCTGGCTGAG 2 981 

GGCAGTTCGC AGCCACCCTG AGGAGTCTGA GGTCCTGAGC ACTGCCAGGA GGGACAAAGG 3 041 

AGCCTGTGAA CCCAGGACAA GCATGGTCCC ACATCCCTGG GCCTGCTGCT GAGAACCTGG 3101 

CCTTCAGTGT ACCGCGTCTA CCCTGGGATT CAGGAAAAGG CCTGGGGTGA CCCGGCACCC 3161 

CCTGCAGCTT GTAGCCAGCC GGGGCGAGTG GCACGTTTAT TTAACTTTTA GTAAAGTCAA 3221 

GGAGAAATGC GGTGG ^^"^^ 

(2) INFORMATION FOR SEQ ID NO : 8: 

(i) SEQUENCE CHARACTERISTICS: 
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!A' LENGTH: 415 arr.ir.c acids 
(e: TVpE; amir.c acid 
iZ) TCPOLOGV. l^near 

;ii.' MOLECULE TYPE; protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NC : 6: 

Met Val Ser Lys Leu Ser Gin Leu Gin Thr Glu Leu Leu Ala Ala Leu 

^5 10 IS 

Leu Glu Ser Gly Leu Ser Lys Glu Ala Leu He Gin Ala Leu Gly Glu 
20 25 30 

Pro Gly Pro Tyr Leu Leu Ala Gly Glu Gly Pre Leu Asp Lys Gly Glu 
35 40 45 

Ser Cys Gly Gly Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Gly Leu 
50 55 60 

Gly Glu Thr Arg Gly Ser Glu Asp Glu Thr Asp Asp Asp Gly Glu Asp 
6 5 7 0 7 5 ' 8C 

Phe Thr Pro Pro lie Leu Lys Glu Leu Glu Asn Leu Ser Pro Glu Glu 
85 90 95 

Ala Ala His Gin Lys Ala Val Val Glu Thr Leu Leu Gin Glu Asp Pro 
100 105 110 

Trp Arg Val Ala Lys Met Val Lys Ser Tyr Leu Gin Gin His Asn He 
115 120 125 

Pro Gin Arg Glu Val Val Asp Thr Thr Gly Leu Asn Gin Ser His Leu 
130 135 140 

Ser Gin His Leu Asn Lys Gly Thr Pro Met Lys Thr Gin Lvs Arg Ala 
=^45 150 155 " 160 

Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin Arg Glu Val Ala Gin Gin 

165 170 175 

Phe Thr Hxs Ala Gly Gin Gly Gly Leu He Glu Glu Pro Thr Gly Asp 
180 185 190 

Glu Leu Pro Thr Lys Lys Gly Arg Arg Asn Arg Phe Lys Trp Gly Pro 
195 200 205 

Ala Ser Gin Gin He Leu Phe Gin Ala Tyr Glu Arg Gin Lys Asn Pre 
210 215 220 

Ser Lys Glu Glu Arg Glu Thr Leu Val Glu Glu Cys Asn Arg Ala Glu 
225 230 235 24C 

Cys lie Gin Arg Gly Val Ser Pro Ser Gin Ala Gin Gly Leu Gly Ser 
245 250 255 

Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Ara 
260 265 270 
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Lys Glu Glu Ala 

275 

Pro Pro Pro Gly 
290 

Pro Gly Leu Pro 
305 

Arg Gly Gin Pro 



Gly Gly Pro Leu 
340 

Thr Gly Leu Glu 
355 

Val Ser Ala Ala 

370 

Ala Gin Leu Gly 
365 

Pro His His Gly 



Phe Arg His Lys 
280 

Pro Gly Pro Gly 
295 

Pro Pro Ala Leu 
310 

Ala Thr Ser Glu 
325 

Val Thr Val Ser 



Pro Ser His Ser 
360 

Gly Gly Pro Leu 
375 

Ala Asp He Pro 
390 

Leu Thr Ser Trp 
405 



Leu Ala Met Asp 



Pro Ala Leu Pro 
300 

Ser Pro Ser Lys 
315 

Thr Ala Glu Val 
330 

Thr Pro Leu His 
345 

Leu Leu Ser Thr 



Pro Arg Gin His 
380 

Arg Pro Gin Pro 
395 

Gly His Asp His 
410 



Thr Tyr Ser Gly 

28 5 

Ala His Ser Ser 



Val His Gly Val 
320 

Pro Ser Ser Ser 
335 

Gin Val Ser Pro 
350 

Glu Ala Lys Leu 
365 

Pro Asp Ser Thr 



Ala Ala Pro Glu 
400 

Arg Ala Trp 
415 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
lA) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY : linear 

(ix) FEATURE: 

(A) NAME/KEY: modi f i ed_base 

(B) LOCATION : 7 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note= "N = A, C, G, or T" 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

13 

GTTAATNATT ACC 



(2) INFORMATION FOR SEQ ID NO : 10: 

(1) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 1 
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::; information for seq :d no.- H: 

i 1 ' SEQVE^CE CHARACTERISTICS ■ 
(Ai LENGTH: 2 4 base pairs 
(E^ TYPE: nucleic acid 

(C) STRANDEDNESS : single 

[D) TOPOLOGY: linear 

(XI } SEQUENCE DESCRIPTION: SEQ ID NO : i: 
CGGTGGGTAC ATTGGTGACA GAAC 

(21 INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LFNGTH: 21 base pairs 
(fi) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
GGCAGGCAAA CGCAACCCAC G 

(2) INFORMATION FOR SEQ ID NO : 13: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
(BJ TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(DJ TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAAGGGGGGC TCGTTAGGAG C 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CATGCACAGT CCCCACCCTC A 

(2) INFORMATION FOR SEQ ID NO : 15; 
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(1) SEQUENCE CHAPJ^CTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CTTCCAGCCC CCACCTATGA G 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY : linear 

{XI) SEQUENCE DESCRIPTION: SEQ ID NO. 16". 
GGGCAAGGTC AGGGGAATGG A 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CAGCCCAGAC CAAACCAGCA C 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: IB: 
CAGAACCCTC CCCTTCATGC C 



{2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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TOPOLOGY: linear 
(XI • SEQUENCE OESCRIPTICK: SEQ ID NO: 19: 
GGTGACTGCT GTCAATGGGA C 



{7: INFORMATION FOR SEQ ID NO : 20: 

(i: SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
fB) TYPE: nucleic acid 
(CJ STRANDEDNESS : single 
(O) TOPOLOGV: linear 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO : 20: 
GGCAGACAGG CAGATGGCCT A 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GCCTCCCTAG GGACTGCTCC A 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TGGAGCAGTC CCTAGGGAGG C 

2 1 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GTTGCCCCAT GAGCCTCCCA C 
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(7J INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHAKACTERISTICS : 
(A) LENGTH: 2 1 base pairs 
(3) T'/PE : nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY : linear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
GGTCTTGGGC AGGGGTGGGA T 

(2) INFORMATION FOR SEQ ID NO : 25: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 25: 
CTGCAATGCC TGCCAGGCAC C 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQLWCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2€ : 
CCCCTGCATC CATTGACAGC C 

(2) INFORMATION FOR SEQ ID NO : 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GAGGCCTGGG ACTAGGGCTG T 

(2) INFORMATION FOR SEQ ID NO : 28: 
(i) SEQUENCE CHARACTERISTICS: 
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;A} LENGTH: 21 rase pairs 
:B TY?E: nucleic acid 
^ D STRAraEDKHSS ; Single 
( Zj J TOPOLOGY ; 1 mear 

(XI 1 SEQL-NCE DESCRIPTION: SEQ 10 KC : 2B: 
CTOTGTCAOA GGCCGAGGGA G 



(2) INFORf^ATION FOR SEQ ID N'O : 29: 

(iJ SEQUENCE CHARACTERISTICS: 
fA) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI.' SEQUENOE DESCRIPTION: SEQ ID NO : 29: 
CCTGTGACAG AGCCCCTCAC C 

21 

(2) INFORMATION FOR SEQ ID NO: 30: 

(1^ SEQUENOE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
(3) TYPE: nucleic acid 
(0) STRANDEDNESS : Single 
(D) TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 30: 
CGGACAGCAA CAGAA3GGGT 'G 



(2) INFORMATION FOR SEQ ID NO : 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : 1 inear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CAGAGCCCCT CACCCCCACA T 



(2) INFORMATION FOR SEQ ID NO : 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY : linear 
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ixi) SEQUENCE DESCRIPTION: SEQ 10 NO: 32: 
GTACCCCTAG GGACAGGCAG G 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ACCCCCCAAG CAGGCAGTAC A 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 671 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( ^x) FEATURE : 

(A) NAME/KEY: CDS 

{B) LOCATION : 104 . . 217 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GCAGAGAGGG CACTGGGAGG AGGCAGTGGG AGGGCGGAGG GCGGGGGCCT TCGGGGTGGG 

CGCCCAGGGT AGGGCAGGTG GCCGCGGCGT GGAG3CAGGG AGA ATG CGA CTC TCC 

Met Arg Leu Ser 
1 



GGC AAT GGTAGGTGGG GGCAGATGTG CCCAGGTGTG CCAGTGGGGG CAGGTGTGCC 
Gly Asn 



2 1 



21 



60 
115 



AAA ACC CTC GTC GAC ATG GAG ATG GCC GAC TAC AGT OCT GCA CTG GAC 16 3 

Lvs Thr Leu Val Asp Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp 

5 10 15 20 

CCA GCC TAC ACC ACC CTG GAA TTT GAG AAT GTG CAG GTG TTG ACG ATG 211 

Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met 

25 30 35 



1167 



TGGGTCCAGG AGCAGATCTT TGGCACTCAA CTTTGGGGTG GGAGGAGAAT GATACAAAAT 3 27 

GGTAGGTTGG TCCTACAGGC CAGCACAGGT GTTGCCAAGT GAAGCCCATG TGCCCAGGCA 3 87 

CAGTGATCAC AGGCATTCTG GGTGAAGGGA GGCCTGCAAG GGCCAATTTC CAGCAAAAGT 44 7 
CGATCCCGGC TATTCCTCCC AGGCCCTTCC AGTCCTCACT GCCTCACAGT GGCTCTGCTT 



50-7 
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Lo^CCAGC.T CA3CGATTC. 
o^__T.o o.-._„GG.T CA3A3A3AC3 3CAA3GGAT 
-.M..^.. -^^^^^^.Go.G GTTGGAGACA TAArCGCATT TGTG 

(2) INFORMATION FOR SEQ ID NO : 35: 

SEQUTNCE CHARACTERISTICS: 
(A) LENGTH: 3 8 ammo acids 
(3) TYPE: a.Tiino acxd 
(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 35; 



Met Ar, Leu Ser .ys Thr Leu Val Asp Met Asp Met Ala Asp Tvr Se^ 

5 10 ,3 

Ala Ala Leu Asp P^o A} r t-7>- Tnv t-u - 

P ^^o A.a T/. Tnr Thr ^eu Glu Phe Glu Asn Val Gin 



30 



Val Leu Tnr Met Gly Asn 
35 



(2) INFORMATION FOR SEQ 10 NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 796 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(O) TOPOLOGY: linear 

(ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION:30in(286. .312, 316.. 375) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TGGATGTTTG TACATGTGTG CTGTGTGTGC GGGTCATAGA GCACATGTGT TTGTGCATGC 
GGACCTGTTG GAGTGCCCTG TTCTTCCTGC ATCTTTATCC TGTATGGGCG TTTTGTCGTG 
TGCCCATATT TGTACCTGCT GTGTATATAT GCAGTTCCCT GTGCTGCGGG CGGGGGTCAG 
CGGTCTCTGG TGTGCACGAC TGCACAGACC CAAATGCAGG ACTCTGTTGT TGCCACTCAC 
CAAGTGAGAT TCATATCAGC AACATGTCCG TTTOTCTCTG AGCAG ATT TGT TGC 



He Cy^s Cys 
1 



CGC TGC GTC TCG CCA GAT TGA GGC ATC CCC TCC GAC ATC ACT GGA G^A 
Arg cys Val Ser Pro Asp Gly He P.o Ser Asp He T^r G^y Alt 



10 ,5 



60 
12 0 
180 
24 C 
294 

342 
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-AT CTG GAG GGG TOG ACA GTT CTC CAC AGG GAG GTAG.oGAA^ 
Tyr Leu Glu Gly Trp Thr Val Leu His Arg Glu 

20 25 
CGGAAACCCC TCCTGGAGGG AAGAGCCCCA TCGGTCCCAG GCCAGCCTCA GAGGAGAGGG 
GGCAGOCAGC TGGCTGAGCT CAGCCTGCCA CCCTGCTTCC TTCTGTGTCT TGGAGCCACT 
CAGCCAGTAT GAGGCTGCAG CTCCAGCTGA GGTCTOGAAT CTTGTGGTCA GCTCAOCTAG 
GGTGAGGAGG CAGCTGCTGG GCACTGCTTG TTGTCAGCTC AGCAGGTGCT CACCTOCCCC 
TGCCGTCCAG TCACGTGTGA CCTTGGGCAT GTCACCTCCC CTATCCTGGC TTCTGTATCT 
TCTACAAAAC AGOCTTCATT CCCCCAGOCC TGCTGGCTGG ACGGCTTTTA GGCCTGTCTG 
AGGACCACGC CAGGAGCGCA AGGCAAAAAC ACACCAGAGA T 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGV ; linear 

(ii> MOLECU1.E TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
lie Cys Cys Arg Cys Val Ser Pro A.p Gly He Pro Ser Asp lie Thr 

1 5 
Glv Ala Tyr Leu Glu Gly Trp Thr Val Leu His Arg Glu 



20 ' 25 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRRNDEDNESS : single 

(D) TOPOLOGY: linear 

(IX) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION : 326 .. 499 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 38: 
CCCCTTGCGA GTTAGGAGGC CGGCTCCCAC CCCAGAAGGT GGCCAGGTTT TCATGCCTTC 
CTAGA3AAAG CTGGGGCTGG TGGCCTCCAC CACAGGGAGA CGCAGACCCT CAGAAACAAG 
TCTGTGAAGT CACAACCAGC CCCAGTTTAC AGATGTGAAA CTGAAGCTCC AAAAAGTCAG 



395 

4 55 
S15 
575 
635 
695 
755 
796 
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GAGGTCAr.G AGT33G3AG3 TGATGGAGTG GAA^AG • . 

- -"^"-^.o^C 7GAGGCCGAA. 

GCCCTGGAGA GATGCCGGCA AGGC-^^"— •-r-^^^-... 

CATTC7GTTC TTCGTGAAGC 

CTCAGTCCCT TCTCTCCTGG CGGAG ACA CGT CC- CA" "Ar 

^ CCA A""'^ 

^nr Arg Pro Has Glr. Lys Ala Pro Th^ 
5 

TCA ACG CGC CCA ACA G."" — - - 

Ser Thr Arg Pro Thr Al'a' vl^ Se^r A^a ^'"^ 

10 P Ala Pro Cys Val Pro Ser Ala 

^° 25 

GGG ACC GGG CCA CGG GCA AA- a-t 

- -J - =.= - ... 3.0 ... 



GCA AGG GCT TCT TCr^ rri^ r-^y. 

Ala Arg .la Ser Se" Cys GlC '^^^^ 

y Ala Cys Gly Arg Thr Thr Cys Thr Pro 



55 



•^CA GGTGAGGAGC C^CAATT-^^t ^^^^ 

Ala C..AATT..T ..hGCTGGGA AATGGGCACA CTTG3GCTCA 



(2) INFORMATION FOR SEQ ID NO: 39: 

'•') SEQUENCE CHARACTERISTICS- 
(A) LENGTH: 58 amino acids 
<B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: prorein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 39: 



Thr Arg Pro H.s Gin Lys Ala Pro Thr Ser T.r Arg Pro T.r Ala Trp 
Val ser Ala Pro Cy. Val Pro Ser Ala Oly T.r Gly Pro Arg Ala^ ^n 
-r T.r val Pro Arg Ala Val T.r Ala Ala Arg AU Ser Ser Gly Oly 
Ala Cys Gly Arg Thr Thr Cys Thr Pro Ala 



45 



(2; INFORMATION FOR SEQ ID NO: 40: 

(1) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 4 58 base pairs 

(B) TYPE: nucleic acid 



2 4 C 

3 0 0 
3 SZ 

400 

44 8 
4 96 
54 9 



TGGCCCCAAG GTCTCTCtt- -r-^^^-^^^^ 

C^C... ..CCTGAGTG GGTAGGTCCC AGAGACAGCT GCCCTTCAGG 

gccttcaagg CTCTTCTGGT TTTGT 



634 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(IX J FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATIONoom (171 . .173, 177. .265) 
ixi) SEQUENCE DESCRIPTION: SEQ ID NO : 40: 

AGAGAGTTCA TAGCACCTTT CCAGCTCCTG GTGGGTTCAA GAGAGAACTC CCGGGATGAA 6 0 

GAGATGAGAG CACTGAGGTT GGGGGGTCAA CTGGATAGCC AGGGCCCTAG TTCTGTCCTA 12 0 

AGAGGAGGAA GTTGTGTCTT CTCCATCCAA CCATCCAAAG CCCTCCCCAG ATT 173 

He 
1 

TAG CCG GCA GTG CGT GGT GGA CAA AGA CAA GAG GAA CCA GTG CCG CTA 221 
Pro Ala Val Arg Gly Gly Gin Arg Gin Glu Glu Pro Val Pro Leu 

5 10 15 

CTG CAG GCT CAA GAA ATG CTT CCG GGC TGG CAT GAA GAA GGA 26 3 

Leu Gin Ala Gin Glu Met Leu Pro Gly Trp His Glu Glu Gly 
20 25 30 

AGGTGAGCCT CGGCCCTCCC CGCCCCACCA CCACTGCCCC ACCTGCACCC ACAGCTCCCC 323 

GACAGTCATT TACAACTGTA GCCACACTTT ATGACTCAGT GGCAGGCCCC AGGGTGACTG 3 83 

GCTAATGGCT GAGAAGAGGG AGGGCCTGGA AATCTGACCA TAGGGAGCGG CTGGGCTTGG 443 

TCTTGAGAAA GATTC 4 58 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH : 30 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY : linear 

(li) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

He Pro Ala Val Arg Gly Gly Gin Arg Gin Glu Glu Pro Val Pro Leu 
15 10 15 

Leu Gin Ala Gin Glu Met Leu Pro Gly Trp His Glu Glu Gly 
20 25 30 



(2) INFORMATION FOR SEQ ID NO; 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 662 base pairs 

(B) TYPE : nucleic acid 

(C) STRATTDEDNESS : single 
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C^) TOPOLOGY: linear 

.ix; FEATUHE: 

MA>!E 'KEY: COS 
(5 ) LOCATION: 84 . .168 

1X1) SECLTENCE OESCRIFTION: SEQ 10 NO: 42: 

TCCCACTCCT CATCAGTCAC AGACACCOCC ACCCCCTACT ZCATCCCT:2T TCTCCCTCCT 

CACCTCTOTG TGCCTCCTCA CAG CCG TCC AGA ATG AGO GGG ACC GGA TCA 

Pro Ser Arg Met Ser Gly Thr Gly Ser 
1 5 

GCA CTC GAA GGT CAA GOT ATG AGG ACA OCA GCC TGC OCT CCA TCA ATG 

Ala Leu Glu Gly Gin Ala Met Arg Thr Ala Ala Cys Pro Pro Se>- Met 

15 20 ^5 

m 

CGC TCC TGC AGG CGG AGG TCC TGT CCC GAC AGGTACCGGG GTGATCCTGC 

Arg Ser Cys Arg Arg Arg Ser Cys Pro Asp 
30 35 



(2) INFORMATION FOR SEQ ID NO: 43: 

(jl) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 3 5 amino acids 

(B) TYPE: ammo acid 
(D) TOPOLOGY : linear 

(11) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

Pro Ser Arg Met Ser Gly Thr Gly Ser Ala Leu Glu Gly Gin Ala Met 
^ 5 10 15 

Arg Thr Ala Ala Cys Pro Pro Ser Met Arg Ser Cys Arg Arg Arg Ser 
20 25 30 

Cys Pro Asp 
35 



6 0 
110 

15B 

206 



268 
328 



CACCCACCCA GGGGATCCCC CACACTACAG AGGAGCTCAC CTCCTCCACC TCCATTCTCC 
CCAGCCAGGC CCTGGAGCAG CTGACGGGAG GGGCCTCAGA TATTACAGAA GGGACACTGA 
GTGCGGTTTC ACATGGCCCA GTTTGCAGCA AGGGCAGGAA TCGAACCTGG CGCCCTGGGG 36 8 

CACTTTCTAA TTCATCCTAC TGCCTGCATC CCACAGGCCA AGCAGAGTCT TCACCTTCAC 44 B 

TGAGGGCCTG CGATCAGCTC AGCTCCGAGA GAACAGAGCA GTGGCTCAGT GGAGAGAGGT 
GGCAAAGTGG GGCCCAGCCC TTCCCTTGCT GAGTGACCTT GGGCAAGTCA CAGCACCTCT 
CTGAGCCATG GTTGCCTCAT TGTCAGAAAA GGATGATGAT TTTTTGCCCT GCTTCTCCTC 
TAAGGCTGAC AGACTCCTTG GGGCTCTAAA GCTG 



508 
568 
628 
662 
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(2) INFORMATION FOR SEQ ID NO : 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 647 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(3) LOCATION : 18 5 . .34 0 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

TTCTCCCTCA TCCCTGCCTC CTCCCTCCCT CCGTTTTTAC CCTGAGCTTC CTTCAGAGCT 6 0 

GGAGGGCACC CACTATCCAG CCCCCTCCCC ACATCTGATT CCAGGGAGGG GGCTCTGTGC 12 0 

AGGGGACAGA GAATGCGGGA GGGCCCGGAC ATCTCCAGCA TTTTCTTCCC TGTATCTCTC leO 

GAAG ATC ACC TCC CCC GTC TCC GGG ATC AAC GGC GAC ATT CGG GCG AAG 22 9 

lie Thr Ser Pro Val Ser Gly lie Asn Gly Asp lie Arg Ala Lys 
1 5 10 15 

AAG ATT GCC AGC ATC GCA GAT GTG TGT GAG TCC ATG AAG GAG CAG CTG 2 77 

Lys lie Ala Ser He Ala Asp Val Cys Glu Ser Mer Lys Glu Gin Leu 
20 25 30 

CTG GTT CTC GTT GAG TGG GCC AAG TAC ATC CCA GCT TTC TGC GAG CTC 32 5 

Leu val Leu Val Glu Trp Ala Lys Tyr He Pro Ala Phe Cys Glu Leu 
35 40 45 

CCC CTG GAC GAC CAG GTGAGGATGG GCGTGGATGG TGGGCAGTAG TGGGCAGTGG 380 
Pro Leu Asp Asp Gin 

50 

GCGGGGCAGC CAGGGGGCTG CTGGCCCACC TGGGATATAG CCGTGGACTG GCTTGATTTT 44 0 

ATTTTATTTA ACAAAATATG TAGTGCACAC ACGTGTCTGA AACTTTAAAT CACCTTACAA 500 

ATATTAACTC AGTTAGCTCC TCCAACAACT CTATGAGGTA GGTACTAAGG TACTATTATT 56 0 

ACTGCCATCT CATAGGTGAG AGATTGGGGC ACAGAGAGGT TAAGTAACCT GCTCAAGGTC 620 

ACATAGCTAC TATCCAGCAT AGCTGGG ^^'^ 



(2) INFORMATION FOR SEQ ID NO : 45: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 52 amino acids 
(3) TYPE: amino acid 
(D) TOPOLOGY: linear 

MOLECULE TYPE: protein 
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(XI' SEQL^NCE DESCRIPTIO:; : SHQ Z2 NO: 45: 

:.e Thr S-r Pro Val Ser Gly He Asn Gly >k3p He Arg Ala Lys Lvs 
" 5 IC i5 ' 

:ie Ala Ser He Ala Asp Val Cys Glu Ser Met Lvs Giu Gin Leu Leu 

20 2S 30 

Val Leu Val Glu Trp Ala Lys Tyr He Pre Ala Phe Cys Glu Leu Pro 
3^ 40 45 

Leu Asp Asp Gin 
50 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 844 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

[A) NAME /KEY: CDS 

;B) LOCATION : 429 . .515 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

ATTTTTACAA AGCACCCTTC ATAATTCTCC ATAGCTGGTC CATGGGTGGG AATTTGGGAC 6 0 

CCACAGTTTT GGAACTTTTT GGGATCATAG ACCTTTTTGA GAATCTCAAA AAAGAAAAAA 12 0 

AAGCACACAG AATGTTGCTT ACAGTTTCAT CAGGCACACA GAAGAGGCCC AGCACGAAGC 18 0 

AGTTTCTTGC CCAAGGACAC AGCAGTTCAA GGACAGAGTC AGCGCGAGGT CTCTCAGCTC 24 0 



300 



TGAGCACATG TTCTTTCCCC TTCCAGGTTT CTAGTTTTAT GGGTAGTAGT TTTATGATGC 

CCATTTCACA GTTCAGGCAG GTAGAGGCAG AGGGGAGCAT TAAGCTGACT TGCCCAGCGT 36 0 

CACTGAGTTG GCTACGGGCA GCCTTCCCAA GGGTACAGAT GGCAAACACT GTTCCTTATC 42 0 

TCTTTCAG GTG GCC CTG CTC AGA GCC CAT GCT GGC GAG CAC CTG CTG CTC 4 70 
Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu 
1 5 10 

GGA GCC ACC AAG AGA TCC ATG GTG TTC AAG GAC GTG CTG CTC CTA 515 
Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu 
15 20 25 

GGTGAGGCGG CTGCCTGCCC TGGCCAGGGC TCCAGGGAGG GTATGCCTAG CATGGCACTC 575 

ACCCAGGCAA GGAGATTCAC ATGGTGGCAT GCAAGGGTGA GGGAGACTAG TCAGGAGTGG 6 35 

CCCTGTCCTC AGGCTTGCAT TGGAGGGCTC CAGGACTCAG TTTTCAACTG GGTACCCCAC 6 95 
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TCAGATGCAA GGAAATGTGG ATGCAAGTCA CCAAATTCCC AGCATTGAAG TCAGAGCACG 755 
ATCAGGGTTA TCCCTGGAAT TACCTGTGCA TCCTTTTTTC TTTTGACAGA GTCTTGCTCT 815 
GTCACTCAGG CTGGAGTGCA ATGATGTGA ^"^^ 

(2) II^FORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: ammo acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; protexn 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO : 47: 

Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu Gly Ala 
15 10 15 

Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu 

20 25 

(2) INFOP>lATIGN FOR SEQ ID NO : 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 937 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

{IX) FEATURE: 

(A) NAME/KEY: CDS 

(5) LOCATION : join (4 85 . . 52 9 , 533 64 0) 

(Xi;^ SEQU-ZNCE DESCRIPTION: SEQ ID NO: 48: 

GCAACACTAG TATTTTAATA TAACAATGCT ATGAGGGAGC TCGATTATTT ATCCTCATCT 60 

TATAGATAAG AAAACTGAGG CACAGAGAGG TTAAGTAACT TATCCAACTA TAACCAGCTA 120 

TCAGGGGCAG AGCCATTTAA GCAGGGCAGT GCAGTTCCAG AATCTGGTCC TTTAACCTTG 180 

ATGCTTTGGT GCCTATCAGG TGACCTTTGA ATGTCATCGA TCTTGTGAGT CATGTTGGTA 240 

AATGGAGCTT GGGTCATGTG AAAGAGGTCC TAGAAAGCCA AGTTCCAAGC TCAGCCGGAT 3 00 

GACTCAAGGC AGCTTATCTT CTGAATCTGG GCCTCAGCTT CCTTACCTGT GAAATGGGAG 36 0 

TCACCATCCC TGCAGGTCCT CCTCCCACAG GCACCAGCTA TCTTGCCAAC TTAAAAGCCA 420 

AAACTAGAGG AGAGGGGTCA ACCCAAAGTG ACTTCCCATC CTCCCTCCCT CCCAACCCTT 4 80 

CCAG GCA ATG ACT ACA TTG TCC CTC GGC ACT GCC CGG AGC TGG CGG AGA 52 9 

Ala Met Thr Thr Leu Ser Leu Gly Tnr Ala Arg Ser Trp Arg Arg 
15 10 IS 
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GCC JGG TGT CCA TAC GCA TCC TTG ACC AGC T3G TGC TGC CCT TCC 
Ala Gly Cys Pro Tyr Ala Ser Leu Thr Ser Trp Cys Cys Pro Ser 



AGG AGC TGC AGA TCG ATG ACA ATG AGT ATG CCT ACC TCA 
Arg Ser Cys Arg Ser Met Thr Met Ser Met Pro Thr Ser Lys Pro Ser 
35 40 45 

TCT TCT TTG ACC CAG GTACAGTGCA CACCTCCTAA GCCATCCCTG ACTCTCTCT" 
Ser Ser l^eu Thr Gin 
50 

CAGAACGCTC TGCCAGACTT CTCCTATTGG GTTCTGTACA CTGAGTTCAC AGCCTCATCT 
CATGTTAACG ACAGCCAGGA GAGGCCGTTT TCATTTAACA GATGAGGCAA GTCAAGATTT 
GAAGAGACAA TATGGCCGGG CGCAGTGGCT CACACCTGTA ATCCCATCAC TTTGGGAGGC 
TGAGGCG3GC GGATCACCTG AGGTCAGGGG TCAAGATGAG CCTGGCTAAC ATGGAGAAAC 
CCCATCTCTA CTTAAAA 

(2) INFORMATION FOR SEQ ID NO: 49: 

(i^ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 ammo acids 

(B) TYPE: amino acid 
(D) TOPOLXDGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Ala Met Thr Thr Leu Ser Leu Gly Thr Ala Arg Ser Trp Arg Arg Ala 
^5 10 15 

Gly Cys Pro Tyr Ala Ser Leu Thr Ser Trp Cvs Cys Pro Ser Arg Ser 
20 25 30 

Cys Arg Ser Met Thr Met Ser Met Pro Thr Ser Lys Pro Ser Ser Ser 
35 40 45 

Leu Thr Gin 
50 



(2) INFORMATION FOR SEQ ID NO: 50: 

(1) SEQL-ENCE CHARACTERISTICS: 

(A) LENGTH: 978 base pairs 
fB) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ix) FEATURE : 

(A) NAME/KEY: CDS 



68C 

740 
800 
860 

920 
937 
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(B) LOCATION: join (376 38-^, 391.. 432, 4 36.. 534, 5 3 8 . . 6 1 0 ; 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO; 50: 

GTGGCTCTGC CAACAACTGG CTGTGCGACC CAGGACAAGT CCTATCTTTG CACTGTGTCT 6 0 

GGGTTTCCCC GTGTGTAAGA TGAGGCGGTT GCTAGGTGCT TATTGGATGC ATTCCTCAAG 12 C 

TCCCGCCCTC CATCTCCTAT TCCCCTCTCT TCTGGTTTAG TGCTTTAGGA AATGTGGCAG 180 

AAATCTTTTT CTGCCTGTGT CTAGGAAATC ATAATTCATG CTGGCGTACC CTGGTTGTTG 24 0 

AGGTCCCTGA ATCCTTGTGC CCACACTGCT GAAGACTCCT TGTGTGACAC AAGTCAGGGG 3 00 

ACATCTGGGT CTTGACTCCC CAGATGCTCC AGGTGGACCC TGCTGCCCTC CCTTGCCCAC 36 0 

CCTCTTCCAT TGTAG ATG CCA AGG GGC TGA GCG ATC CAG GGA AGA TCA AGO 4 11 

Met Pro Arg Gly Ala He Gin Gly Arg Ser Ser 

1 S 10 

GGC TGC G7T CCC AGG TGC AGG TGA GOT TGG AGG ACT ACA TCA ACG ACC 4 59 

Gly CVS Val Pre Arg Cys Arg Ala Trp Arg Thr Thr Ser Thr Th- 

is 20 25 

GCC AGT ATG ACT CGC GTG GCC GCT TTG GAG AGC TGC TGC TGC TGC TGC 507 
Ala Ser Met Thr Arg Val Ala Ala Leu Glu Ser Cys Cys Cys Cys Cys 
30 " 35 40 

CCA CCT TGC AGA GCA TCA CGT GGC AGA TGA TCG AGC AGA TCC AGT TCA 555 
Pro Pro Cys Arg Ala Ser Arg Gly Arg Ser Ser Arg Ser Ser Ser 

45 50 55 

TCA AGC TCT TCG GCA TGG CCA AGA TTG ACA ACC TGT TGG AGG AGA TGC 603 
Ser Ser Ser Ser Ala Trp Pro Arg Leu Thr Thr Cys Trp Arg Arg Cys 
60 65 70 

TGC TGG GAGGTCCGTG CCAAGCCCAG GAGGGGCGGG GTTGGATTGG GGACTCCCCA 65 9 
Cys Trp 
75 

GGAGACAGGC CTCACACAGT GAGCTCACCC CTCAGCTCCT TGGCTTCCCC ACTGTGCCGC 719 

TTTGGGCAAG TTGCTTAACC TGTCTGTGCC TCAGTTTCCT CACCAGAAAA ATGGGAACAA 77 9 

GGCAATGGTC TATTTGTTCA GGCACCGAGA ACCTAGCACG TGCCAGTCAC TGTTCTAAGT 83 9 

GCTGGCAATT CAGCAAAGAA CAAGATCTTT GCCCTCGGGG AGGCTGTGTG TGTGTGATAT 8 99 

GTATGGATGC GTGGATATCT GTGTATATGC CCGTATGTGC GTGCATGTGT ATATAAAGCC 95 9 

TCACATTTTA TGATTTTGA ^"^8 

(2) INFORMATION FOR SEQ ID NO: 51: 

(1/ SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 75 ammo acids 
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(B) TYPE: amine acid 
TOPOLOGY: Imedr 

^1^^ MOLECULE TYPE: Drorem 

^xi. SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
Me. P.o Arg Gly Ala He Gin Gly Arg Ser Ser G.v Cvs 



^'al Pro Arg 



10 ^5 



Cys Arg Ala Trp A 



50 " Ser Ala Trp Pro 

60 



Arg Thr Ser Thr Thr Ala Ser Met Thr Arg Val 

30 

Ala Ala .eu CIu Ser Cys Cys Cys Cys Cys Pro Pro Cys Arg Ala Ser 

^° 45 

Arg Gly Arg Ser Ser Arg Ser Ser Ser Ser 
50 55 

Arg Leu Tl.r Thr Cys Trp Arg Arg Cys Cys Tro 

70 ^5 

(2) INFORMATION FOR SEQ ID NO: 52: 

SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 584 base pairs 

(B) TYPE: nucleic acid 
!C; STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ix; FEATL'RE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (443. .490, 494.. 595) 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

GGGACACATA GATOCTATAA CTAGOTCACT TGGCTaCAOC AGACATCTGC GGGATOA3GC 
TGAAAGGTGA GGCGGGACCA AATGGTTGAA GGACTTGCAC TCCAAGGAGC TTTGAGAGCC 
ATTGATTACA TCCATTATGT TACTATGTGA CCAATACATT ACTCATTAGA ACATTTACGT 
3ATCTCACAG CTTCCTTATA TOCACCTTGT TCCTTTCAAC TCACTTTTGT TCTCTTGGTT 
TTTTGGGGTC CTCTTAACAC CCTCATGAAG TCTATAGATG GCAATGGTAC ACCCTAGTTT 
ACTAACCCAG GAATAGGTAC CCAACAGGCA CTGCCAATAT TGGATGGGCT GGTTCATTGC 

CCACGCCTGA GGAAGATGGC GTCCCAAGG^ CTra— -rr^T^.^ ^.^^^ 

^^^>^L:rC3^ CTGA^oTCTG CATCCCAGAC TCTCCATCCT 

GATCGACCT- CTCTACCTGC AG GGT CCC — nnr> 

A.G uAC CCC ATG CCC A^r 
Gly Pro Pro Ala Met His Pro Met P^o T^^ 

^ ^ 10 

ACC CCC TGC ACC CTC ACC TGA TGC A-- ^ 
Thr Pro Cys Tnr Leu Thr cys Ar^ 

".sr. .xe Trp Glu Pro Thr Ser 

20 



60 
120 
180 
240 
300 
360 
420 
472 

520 
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TCG TTG CCA ACA CAA TGC CCA CTC ACC TCA GCA ACQ GAC AGA TGT GTG 56 6 
Ser Leu Pro Thr Gin Cys Pro Leu Thr Ser Ala Thr Asp Arg Cys Val 
30 ' 35 40 

AGT GGC CCC GAC CCA GGG GAC AGG CAG GTGGGCAAAC TCTGGGATTT 6 15 
Ser Gly Pro Asp Pro Gly Asp Arg Gin 
45 50 

TACCTTGCAA AGGGTGAGGA TGGGGCTTAA GACAGGAGGC AGGAGAAAGT GGAGTCTAGA 6 5 

AGGTAGAACC AGGA7GCAAC AGTTTTCTGG GTTCCAGGGT AGGGAATAAA GGGCAAGATT 73 5 

GTCCATTTGT TGAGGCTGTT TATTCAGTAA GGTGACTGAC AGCCTTTACT GAATGAAGCC 79 5 

ATTGTTGGGA TGAGGCAATC CACTGGATGA GGTAACCCAT TGGGTGAAGA TGTCTTGGGT 85 5 

GAGAATTCCA TTAGTTGACA TTGTCCATTA AGTAAAAGTG GTCATTGAAG TAAGGCTGCA 915 

CAGTTGGGTA AGGCTATCCA TTAGACATTA GATGAGACTA CCCATTGGGT CAGGATGTCT 97 5 

GCTGGGCTA 



(2) INrORi^IATION FOR SEQ ID NO: 53: 

U) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 0 ammo acids 
(3) TYPE: aTrano acid 
(D) TOPOLOGY: linear 

(ii) KOLECL-LE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Gly Pro Pro Ala Met His Pro Met Pro Thr Thr Pro Cys Thr Leu Thr 
1 5 10 15 

Cys Arg Asn lie Trp Glu Pro Thr Ser Ser Leu Pro Thr Gin Cys Pro 
20 25 30 

Leu Thr Ser Ala Thr Asp Arg Cys Val Ser Gly Pro Asp Pro Gly Asp 
35 40 45 

Arg Gin 

50 



(2) INFORMATION FOR SEQ ID NO: 54: 

(l) SEQUENCE CHARACTERISTICS; 

{Wi LENGTH: 1103 base pairs 

(B) TYPE: nucleic acid 

( C) STRANDEDMESS : single 

(D) TOPOLOGY : linear 



(ix) FEATURE: 

(A) NAME /KEY: CPS 
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{B; LOCATlOt; , join .285 .. 42 9, 4 3 3..*;'^'^, 451.. 492, 4 95., 6: 
..63C, 634. ."'SO, 754. .810, 814. .843, 84"^. .1023, 



102^. .1071, 1075. .1103) 



ixi^ SEQUENCE DESCRIPTION; SEQ ID NO: 54: 

TGGGAGAA GCAGTCCAAG TCTGCATATC AAATAAATGA TGGAGGAGAT GGG7GGTAGG 6 0 

ACCTTCCAGA CCTCATAAAA CTTAGGCTTT ATGATCTGGG ACTCACAGAA GGTTGAGCAA 12 0 

TAAAAGACCT TAGGGATTAT CTGGCTTAAT TAATTCTCTC ATTTTATAGA GGAAGAAATT 18 0 

AAGTCAAGGT GGGGCAGGGT GGGAGGGGAG AACTTTCCCG GGGCTCTTCA TTTACTCCCA 24 0 

CAAAGGCTGG AATTTTGAGC AGCCCCTGTC TGTCTGTTTG TCCTTCCA GCC ACC CCT 29 7 

Ala Thr Pro 

1 

GAG ACC CCA CAG CCC TCA CCG CCA GGT GGC TCA GGG TCT GAG CCC TAT 34 5 

GIu Thr Pro Gin Pro Ser Pro Pro Gly Gly Ser Giy Ser Glu Pro Tyr 
5 10 15 ' 



AAG CTC CTG CCG GGA GCC GTC GCC ACA ATC GTC AAG CCC CTC TCT GCC 
Lys Leu Leu Pro Gly Ala Val Ala Thr lie Val Lys Pro Leu Ser Ala 
20 25 30 35 



GCC ACT GCC TTC ACC TTC ACC TTC ATC CAT GTC CAA CCC CCG ACT TCA 
Ala Thr Ala Phe Thr Phe Thr Phe He His Val Gin Pro Pro Thr Ser 
130 135 140 



39: 



ATC CCC CAG CCG ACC ATC ACC AAG CAG GAA GTT ATC TAG CAA GCC GCT 44 1 

He Pro Gin Pro Thr He Thr Lys Gin Glu Val He Gin Ala Ala 

^0 45 50 

GGG GCT TGG GGG CTC CAC TGG CTC CCC CCA GCC CCC TAA GAG AGC ACC 48 9 

Gly Ala Trp Gly Leu His Trp Leu Pro Pro Ala Pro Glu Ser Thr 

55 60 V 65 

TGG TGA TCA CGT GGT CAC GGC AAA GGA AGA CGT GAT GCC AGG ACC AGT 53 7 

Trp Ser Arg Gly His Gly Lys Gly Arg Arg Asp Ala Arg Thr Ser 

70 75 80 

CCC AGA GCA GGA ATG GGA AGG ATG AAG GGC CCG AGA ACA TGG CCT AAG 58 5 

Pro Arg Ala Gly Met Gly Arg Met Lys Gly Pro Arg Thr Trp Pro Lys 
95 90 95 

GCA CAT CCC ACT GCA CCC TGA CGC CCT GCT CTG ATA ACA AGA CTT 6 30 

Ala His Pro Thr Ala Pro Arg Pro Ala Leu He Thr Arg Leu 

100 105 110 

TGA CTT GGG GAG ACC CTC TAC TGC CTT GGA CAA CTT TCT CAT GTT GAA 6 76 

Leu Gly Glu Thr Leu Tyr Cys Leu Gly Gin Leu Ser His Val Glu 
115 120 125 



^26 



TCC CAA AGG ACA GCC GCC TGG AGA TGA CTT GAG CCT TAC TTA AAC CCA 7 74 

Ser Gin Arg Thr Ala Ala Trp Arg Leu Glu Pro Tyr Leu Asn Pro 

145 150 155 
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GCT CCC TTC TTC CCT AGC CTG GTG CTT CTC CTC TCC TAG CCC CGG TCA 
Ala Pro Phe Phe Pro Ser Leu Val Leu Leu Leu Ser Pro Arg Ser 

160 165 170 

TGG TGT CCA GAC AGA GCC CTG TGA GGC TGG GTC CAA TTG TGG CAC TTG 
Trp Cys Pro Asp Arg Ala Leu Gly Trp Val Gin Leu Trp His Leu 

175 180 les 

GGG CAC CTT GCT CCT CCT TCT GCT GCT GCC CCC ACC TCT GCT GCC TCC 
Gly His Leu Ala Pro Pro Ser Ala Ala Ala Pro Thr Ser Ala Ala Ser 
190 195 200 

CTC TGC TGT CAC CTT GCT CAG CCA TCC CGT CTT CTC CAA CAC CAC CTC 96 6 

Leu Cys Cys His Leu Ala Gin Pro Ser Arg Leu Leu Gin His His Leu 
205 210 215 

TAC AGA GGC CAA GGA GGC CTT GGA AAC GAT TCC CCC AGT CAT TCT GGG 1014 
Tyr Arg Gly Gin Gly Gly Leu Gly Asn Asp Ser Pro Ser His Ser Gly 

220 225 230 

AAC ATG TTG TAA GCA CTG ACT GGG ACC AGG CAC CAG GCA GGG TCT AGA 106 2 

Asn Met Leu Ala Leu Thr Gly Thr Arg His Gin Ala Gly Ser Arg 

235 240 245 

AGG CTG TGG TGA GGG AAG ACG CCT TTC TCC TCC AAC CCA AC 110 3 

Arg Leu Trp Gly i.ys Thr Pro Phe Ser Ser Asn Pro 

250 255 260 



(2) INFORMATION FOR SEQ ID NO : 55: 

(l) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 261 ammo acids 
IB) TYPE: ammo acid 
(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: protein 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro Gly Gly Ser Gly Ser 
15 10 15 

Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala Thr lie Val Lys Pro 
20 25 30 

Leu Ser Ala He Pro Gin Pro Thr He Thr Lys Gin Glu Val He Gin 
35 40 45 

Ala Ala Gly Ala Trp Gly Leu His Trp Leu Pro Pro Ala Pro Glu Ser 
50 55 60 

Thr Trp Ser Arg Gly His Gly Lys Gly Arg Arg Asp Aia Arg Thr Ser 
65 70 75 80 

Pro Arg Ala Gly Met Gly Arg Met Lys Gly Pro Arg Tnr Trp Pro Lys 
85 90 95 
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Ala His Pro Tr.r 
100 

Glu Thr Leu Tyr 
115 

Pne Tnr Phe Thr 

130 

Thr Ala Ala Trp 
145 

Pro Ser Leu Val 



Ala Leu Gly Trp 
180 

Ser Ala Ala Ala 
195 

Gin Pro Ser Arg 
21C 

Leu Gly Asn Asp 
225 

Gly Thr Arg His 



Phe Ser Ser Asn 

260 



Al a Pre Arg Pre 



Cys Leu Gly Gin 
120 

Phe lie His Val 
135 

Arg Leu Glu Pro 
150 

Leu Leu Leu Ser 
165 

Val Gin Leu Trp 



Pro Thr Ser Ala 

200 

Leu Leu Gin His 

215 

Ser Pro Ser His 
230 

Gin Ala Gly Ser 
245 

Pro 



Ala Leu lie Tnr 
105 

Leu Ser His Val 



Gin Pro Pro Thr 
14 0 

Tyr Leu Asn Pro 
155 

Pro Arg Ser Trp 

170 

His Leu Gly His 
185 

Ala Ser Leu Cys 



His Leu Tyr Arg 

220 

Ser Gly Asn Met 
235 

Arg Arg Leu Trp 
250 



Arg Leu Leu Gl\' 



Glu Ala Thr Ala 
12 5 

Ser Ser Gin Arg 

Ala Pro Phe Phe 
160 

Cys Pro Asp Arg 
175 

Leu Ala Pro Pro 
190 

Cys His Leu Ala 
205 

Gl>- Gin Gly Gly 



Leu Ala Leu Thr 

24 0 

Gly Lys Thr Pro 
255 



(2) INFOPJ^TION FOR SEQ ID NO: 56: 

(l) SEQUENCE CKAPoACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : 1 mear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
GGGCACTGGG AGGAGGCAGT 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(3) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

[XI) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
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{21 INFORMATION FOR SEQ 10 NO; 58: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE nucleic acid 

(C) STRANDEDNESS : Single 

( D ) TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
TCTGGTGTGC ACGACTGCAC 

C) INFORMATION FOR SEQ ID NO : 59: 

(i; SEQUENCE CHARA-TSRISTICS : 
(A; LENGTH: 20 i.ase pairs 
f3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 59: 
CTGGAGCTGC AGCCTCATAC 

(2) INFORMATION FOR SEQ ID NO: 60: 

(1) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
AAGGCTCCCT TAGATGCCTG 

(2) INFORMATION FOR SEQ ID NO : 61: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(XI) SEQL^NCE DESCRIPTION: SEQ ID NO: 61 
CCACTCAGGG AGAAGACAGA CCT 

(2) INFORMATION FOR SEQ ID NO: 62: 
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SEQTJEi^CE CHA^crERZSZlZS : 
■A) LENGTH: 20 base oairs 
TYP£. nucleic acid 
STRANDEDNESS : single 
(D) TDPCLOGY. linear 

^xi; SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
CCTAGTTCTG TCCTAA3AGG 

(2J INFORMATION FOR SEQ ID NO: 63: 

fi) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAiroEDNESS : single 
CD) TOPOLOGY: linear 

ixi: SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GTCATAAAGT GTGGCTACAG 



(2) INF0RJ4ATION FOR SEQ ID NO: 64: 

ti' SEQUENCE CHARACTERISTICS; 

<A) LENGTH: 2 2 base pairs 
(B) TYPE: nucleic acid 
iC) STPJiNDEDNESS : single 
(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
CCACCCCCTA CTCCATCCCT GT 

22 

(2) INFORMATION FOR SEQ ID NO: 65: 

(^J SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 65: 
'CCTCCCGTC AGCTGCTCCA 

20 

2) INFORMATION FOR SEQ ID NO: 66: 

SEQUENCE CHARACTERISTICS: 
fA) LENGTH: 20 base pairs 
CB) TYPE: nucleic acid 
(C) STP.ANDEDMESS : single 
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(D) TOPOLOGY: linear 
(XI) SEQUENCE DESCRIPTIOt: : SEQ NO: 66: 

GTGOAGGGGA CAGAGAATGC 



(2 1 INFORMATION FOR SEQ ID NO : 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
AATCAAGCCA GTCCACGGCT AT 



{2) INFORMATION FOR SEQ ID NO : 66- 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

^xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
GCCCAGCGTC ACTGAGTTGG CTA 



(2) INFORMATION FOR SEQ ID NO : 69: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TTGCCTGGGT GAGTGCCATG 



(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(XI S SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
GCACCAGCTA TCTTGCCAAC 
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'i; SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
iC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(XI J SEQUENCE DESCRIPTION: SEQ ID NO: 71 

AGGAGAAGTC TGGCAGAGCG 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRT^EDNESS ; single 

(D) TOPOLOGY: linear 

(XI ; SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CTCCTTGTGT GACACAAGTC 

{2; INFORMATION FOR SEQ ID NO : 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2C base pairs 

(B) TYPE: nucleic acid 
fC) STRANDEDNESS : single 
(D) TOPOLOGY : 1 mear 

{XI} SEQL-ENCE DESCRIPTION: SEQ ID NO: 73: 

CTCACTGTGT GAGGCCTGTC 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
TGGTTGATTG GCCACGCCTG 

(2) INFOPJ^TION FOR SEQ ID NO: 75: 
(1} SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(3) TYPE: nucleic acid 

(C) STRA^JDEDNESS : single 

f D ) TOPOLOGY : 1 mear 

(xiJ SEQUENCE DESCRIPTION. SEQ ID NO: 75: 
ATCCTGGTTC TACCTTCTAG 



(2) INFORMATION FOR SEQ ID NO: 76; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
CATTTACTCC CACAAAGGCT -0 



12) INFORMATION FOR SEQ ID NO ; 77: 

(i) SEQUENCE CHARACTERISTICS: 
;A) LENGTH: 20 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
GACCACGTGA TCACCAGGTG 



(2) INFORMATION FOR SEQ ID NO: 78: 

[ 1 : SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 4 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION; 20. .1414 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
CTCCAAAACC CTCGTCGAC ATG GAC ATG GCC GAC TAC AGT GCT GCA CTG GAC 52 



Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp 
15 IC 



CCA GCC 
Pro Ala 



C TAC ACC ACC CTG GAA TTT GAG AAT GTG CAG GTG TTG ACG ATG 

a Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met 
15 20 2S 



100 
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GGC A^.T GAT AGG TCC CCA TCA GAA GGC ACC AAC C AAC GCG CCC AAC 148 

Gly Asn Asp Thr Ser Pro Ser Glu Gly Thr Asn Leu Asn Ala Pro Asr. 

3 C 3 5 4 C 

AGC CTG GGT GTC AGC GCC CTG TGT GCC ATC TGC GGG GAC CGG GCC ACG 196 

Ser Leu Gly Val Ser Ala Leu Cys Ala He Cys Gly Asp Arg A. a Thr 
4^ 50 55 



GGC AAA CAC TAC GGT GCC TCG AGC TGT GAC GGC TGC AAG GGC TTC 7TC 
Gly Lys His Tyr Gly Ala Ser Ser Cys Asp Gly Cvs Lvs Gly Phe Phe 
^0 65 70 ' 75 

CGG AGG AGC GTG CGG AAG AAC CAC ATG TAC TCC TGC AGA TTT AGC CGG 
Arg Arg Ser Val Arg Lys Asn His Met Tyr Ser Cys Arg Phe Ser Arg 
80 85 90 

CAG TGC GTG GTG GAC AAA GAC AAG AGG AAC CAG TGC CGC TAC TGC AGG 
Gin Cys Val Val Asp Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Ara 
55 100 105 

CTC AAG AAA TGC TTC CGG GCT GGC ATG AAG AAG GAA GCC GTC CAG AAT 
Leu Lys Lys Cys Phe Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn 

115 120 

GAG CGG GAC CGG ATC AGC ACT CGA AGG TCA AGC TAT GAG GAC AGC AGC 
Glu Arg Asp Arg He Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser 
125 130 135 

CTG CCC TCC ATC AAT GCG CTC CTG CAG GCG GAG GTC CTG TCC CGA CAG 
Leu Pro Ser He Asn Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin 
140 145 150 155 

ATC ACC TCC CCC GTC TCC GGG ATC AAC GGC GAC ATT CGG GCG AAG AAG 
He Thr Ser Pro Val Ser Gly He Asn Gly Asp He Arg Ala Lys Lys 
160 165 170 

ATT GCC AGC ATC GCA GAT GTG TGT GAG TCC ATG AAG GAG CAG CTG CTG 
He Ala Ser He Ala Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu 
175 180 185 

GTT CTC GTT GAG TGG GCC AAG TAC ATC CCA GCT TTC TGC GAG CTC CCC 
Val Leu Val Glu Trp Ala Lys Tyr He Pro Ala Phe Cys Glu Leu Pro 
150 195 200 



244 



29: 



340 



3B8 



t36 



484 



532 



580 



62B 



CTG GAC GAC CAG GTG GCC CTG CTC AGA GCC CAT GCT GGC GAG CAC CTG 6 76 

Leu Asp Asp Gin Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu 
205 210 215 

CTG CTC GGA GCC ACC AAG AGA TCC ATG GTG TTC AAG GAC GTG CTG CTC 72 4 

Leu Leu Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu 

220 225 230 235 

CTA GGC AAT GAC TAC ATT GTC CCT CGG CAC TGC CCG GAG CTG GCG GAG 772 

Leu Gly Asn Asp Tyr He Val Pro Arg His Cys Pro Glu Leu Ala Glu 

240 245 250 
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ATG AGC CGG GTG TCC ATA CGC ATC CTT GAC GAG CTG GTG CTG CCC TTC 

Met Ser A. g Val Ser lie Arg lie Leu Asp Glu Leu Val Leu Pro Phe 

255 260 265 

CAG GAG CTG CAG ATC GAT GAC AAT GAG TAT GCC TAG CTC AAA GCC ATC 
Gin Glu Leu Gin He Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala He 
270 275 280 

GAC CCA GAT GCC AAG GGG CTG AGC GAT CCA GGG AAG ATC 
He Phe Phe Asp Pro Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys lie 
285 290 

AAG CGG CTG CGT TCC CAG GTG CAG GTG AGC TTG GAG GAC TAC ATC AAC 
Lys Arg Leu Arg Ser Gin Val Gin Val Ser Leu Glu Asp Tyr He Asn 
300 305 310 315 

GAC CGC CAG TAT GAC TCG CGT GGC CGC TTT GGA GAG CTG CTG CTG CTG 
ASP Arg Gin Tyr Asp Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu 

320 325 330 

CTG CC- Arc TTG CAG AGC ATC ACC TGG CAG ATG ATC GAG CAG ATC CAG 
Le- Pro -hr Leu Gin Ser He Thr Trp Gin Met He Glu Gin He Gin 
335 340 345 

TTC ATC AAG CTC TTC GGC ATG GCC AAG ATT GAC AAC CTG TTG CAG GAG 
ohe He Lys Leu Phe Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu 



350 



35S 360 



ATG CTG CTG GGA GGG TCC CCC AGC GAT GCA CCC CAT GCC CAC CAC CCC 
Met Leu Leu Gly Gly Ser Pro Ser Asp Ala Pro Hxs Ala H.s His Pro 

365 370 375 

CTG CAC CCT CAC CTG ATG CAG GAA CAT ATG GGA ACC AAC GTC ATC GTT 
Leu Hrs Pro His Leu Met Gin Glu His Met Gly Thr Asn Val He Val 
380 385 390 395 

GCC AAC ACA ATG CCC ACT CAC CTC AGC AAC GGA CAG ATG TGT GAG TGG 
Ala Asn Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp 
400 405 410 

CCC CGA CCC AGG GGA CAG GCA GCC ACC CCT GAG ACC CCA CAG CCC TCA 
Pro Arg Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser 
415 420 425 

CCG CCA GGT GCG TCA GGG TCT GAG CCC TAT AAG CTC CTG CCG GGA GCC 
Pro Pro Gly Ala Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala 
430 435 440 

GTC GCC ACA ATC GTC AAG CCC CTC TCT GCC ATC CCC CAG CCG ACC ATC 
Val Ala Thr He Val Lys Pro Leu Ser Ala He Pro Gin Pro Tnr He 
445 450 455 

ACC AAG CAG GAA GTT ATC TAGCAAGCCG CTGGGGCTTG GGGGCTC 
Thr Lys Gin Glu Val He 
460 465 



620 



868 



916 



964 



loi: 



1060 



HOB 



L156 



1204 



1252 



1300 



1346 



1396 



1441 
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2) INFORMATION FOP SEQ ID NO: 79- 

;i: SE;^ai:NCE characteristics: 

(A) LENGTH; 46 5 aT.mo acids 

(B) TYPE: ammo acid 
;0} TOPOLOGY; linear 

(li) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 



Met Asp Mer Ala Asp Tyr Ser Ala Ala Leu Asp Pro Ala Tvr Tnr Thr 

5 10 15 

Leu Glu Phe Glu Asn Val Gin Val Leu Thr Mec Gly Asn Asp Tnr Ser 

25 30 

Pro ser Glu Gly Thr Asn Leu Asn A.a Pro Asn Ser Leu Gly Val Ser 
^5 40 43 

Ala Leu Cys Ala He Cys Gly Asp Arg Ala Thr Gly Lys His Tvr Gly 
^ 55 60 

Ala Ser Ser Cys Asp Gly Cys Lys Gly Phe Phe Arg Ara Ser Val Ara 

7s ^ eo 

Lys Asn His K^z Tyr Ser Cys Arg Phe Ser Arg Gin Cys Val Val Asp 



85 90 



95 



Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Arg Leu Lys Lvs Cvs Phe 

100 ' 

Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn Glu Arg Asp Arg He 

120 



125 



Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser Leu Pro Ser I^e Asn 

135 

Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin He Thr Ser Pro Val 

"° ISO 

Ser Gly He Asn Gly Asp He Arg Ala Lys Lys He Ala Ser He Ala 

170 

Asp Val Cys Glu Ser Met Lys Glu Gin Leu Leu Val Leu Val Glu Tro 
180 185 190 

Ala Lys Tyr He Pro Ala Phe Cys Glu Leu Pro Leu Asd Asp Gin Val 
195 200 205 

Ala Leu Leu Arg Ala His Ala Gly Glu Has Leu Leu Leu Glv Ala Thr 

220 

Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu Glv Asn Asp Tyr 

235 240 

He Val Pro Arg His Cys Pro Glu Leu Ala Glu Met Ser Arq Val Se- 
245 250 255 
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lie Arg lie Leu Asp Glu Leu Val Leu Pro Phe Gin Glu Leu Gin lie 
260 265 270 

Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala lie lie Phe Phe Asp Pro 
275 280 285 

Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys lie Lys Arg Leu Arg Ser 
290 295 300 

Gin Val Gin Val Ser Leu Glu Asp Tyr He Asn Asp Arg Gin Tyr Asp 
305 310 315 320 

Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu Leu Pro Thr Leu Gin 
325 330 335 

Ser He Thr Trp Gin Met He Glu Gin He Gin Phe He Lys Leu Phe 
340 345 350 

Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu Met Leu Leu Gly Gly 

355 360 365 

Ser Pro Ser Asp Ala Pro His Ala His His Pro Leu His Pro His Leu 
370 375 380 

Met Gin Glu His Met Gly Thr Asn Val He Val Ala Asn Thr Met Pro 
385 390 395 400 

Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp Pro Arg Pro Arg Gly 
405 410 415 

Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro Gly Ala Ser 
420 425 430 

Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala Thr He Val 
435 440 445 

Lys Pro Leu Ser Ala He Pro Gin Pro Thr He Thr Lys Gin Glu Val 
450 455 460 

He 
465 



(2) INFORMATION FOR SEQ ID NO : 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2329 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 80; 
GGGGCCCTGA TTCACGGGCC GCTGGGGCAG GGTTGGGGGT TGGGGGTGCC CACAGGGTTG 6 0 

GCTAGTGGGG TTTTGGGGGG GCAGTGGGTG CAAGGAGTTT GGTTTGTGTC TGCCGGCCGG 12 0 
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CAGGCAAACC 


CAACCACGCG 


GTGGGGGAGG 


CG 3 CTAGL GT 


o J i. \jG^GGG c 


-J ^ - J G ^ G 


ib: 


CTGTGGCAGC 




TTTCTAAACT 


GAGCCAGCT3 


^AGAwGGAGC 


* ^ ^ * oG^GGG 


24 J 


CGTGCTCGAG 


TCAGGGCTGA 


GCAAAGAGG C 


A GTG AT CC AG 


G ^?xGT^GGTG 


Ao C^GGGGCG 


300 


CTACCTCGTG 


GCTGGAGAAG 


GCCCCCTGGA 


CAAGGGGGAG 


TCCTGCGGCG 


J — vj'o - CvjAGG 


360 


GGAGCTGGCT 


GAGCTGCCCA 


ATGGGCTGGG 


GGAGACTCGG 


GGCTGQGAGG 


AGGAGACGGA 


42 0 


CGACGATGGG 


GAAGACTTCA 


CGGCACCCAT 


CCTCAAAGAG 


CTGGAGAACC 


TCAGGGCTGA 


480 


GGAGGCGGCC 


CACCAGAAAG 


CCGTGGTGGA 


GACCCTTCTG 


CAGGAGGACC 


CGTGGCGTGT 


54 0 


GGCGAAGATG 


GTCAAGTCCT 


ACCTGCAGCA 


GCACAACATC 


CCACAGCGGG 


AGGTGGTCGA 


600 


TACCACTGGC 


CTCAACCAGT 


CCCACCTGTC 


CCAACACCTC 


AACAAGGGCA 


CTCCCATGAA 


660 


GACGCAGAAG 


CGGGCCGCCC 


TGTACACCTG 


GTACGTCCGC 


AAGCAGCGAG 


AGGTGGCGCA 


1 2 


GCAGTTCACC 


CATGCAGGGC 


AGGGAGGGCT 


GATTGAAGAG 


CCCACAGGTG 


ATGAGCTACC 


780 


AACCAAGAAG 


GGGCGGAGGA 


ACCGTTTCAA 


GTGGGGCCCA 


GCATCCCAGC 


AGATCCTGTT 


B4C 


CCAGGCCTAT 


GAGAGGCAGA 


AGAACCCTAG 


CAAGGAGGAG 


CGAGAGACGC 


TAGTGGAGGA 


900 


GTGCAATAGG 


GCGGAATGCA 


TCCAGAGAGG 


GGTGTCCCCA 


TCACAGGCAC 


AGGGGCTGGG 


960 


CTCCAACCrC 


GTCACGGAGG 


TGCGTGTCTA 


CAACTGGTTT 


GCCAACCGGC 


GCAAAGAAGA 


1020 


AGCCTTCCGG 


CACAAGCTGG 


CCATGGACAC 


GTACAGCGGG 


CCCCCCCCAG 


GGCCAGGCCG 


1080 


GGGACCTGCG 


CTGCCCGCTC 


ACAGCTCCCC 


TGGCCTGCCT 


CCACCTGCCC 


TCTCCCCCAG 


1140 


TAAGGTCCAC 


GGTGTGCGCT 


ATGGACAGCC 


TGCGACCAGT 


GAGACTGCAG 


AAGTACCCTC 


1200 




GGTCCCTTAG 


TGACAGTGTC 


TACACCCCTC 


CACCAAGTGT 


CCCCCACGGG 


1260 


CCTGGAGCCC 


AGCCACAGCC 


TGCTGAGTAC 


AGAAGCCAAG 


CTGGTCTCAG 


CAGCTGGGGG 


. 1320 


cccccrcccc 


CCTGTCAGCA 


CCCTGACAGC 


ACTGCACAGC 


TTGGAGCAGA 


CATCCCCAGG 


1380 


CCTCAACCAG 


CAGCCCCAGA 


ACCTCATCAT 


GGCCTCACTT 


CCTGGGGTCA 


TGACCATCGG 


1440 


GCCTGGTGAG 


CCTGCCTCCC 


TGGGTCCTAC 


GTTCACCAAC 


ACAGGTGCCT 


CCACCCTGGT 


1500 


CATCGGCCTG 


GCCTCCACGC 


AGGCACAGAG 


TGTGCCGGTC 


ATCAACAGCA 


TGGGCAGCAG 


1S60 


CCTGACCACC 


CTGCAGCCCG 


TCCAGTTCTC 


CCAGCCGCTG 


CACCCCTCCT 


A'^CAGTAn^r 

— w r^^J iv \_ 


1 *7 n 
X o ^ u 


GCTCATGCCA 


CCTGTGCAGA 


GCCATGTGAC 


CCAGAGCCCC 


TTCATGGCCA 


CCATGGCTCA 


1680 


GCTGCAGAGC 


CCCCACGGTG 


AGCACCCTGT 


GCCCCACACA 


GCAGGAGATG 


ATGATAGAGG 


1740 


TTGGCTGTCA 


ATGGATGCAG 


GGGAAAGGGG 


TGGCTGGCAG 


GCATTGCAGT 


CTGCATGTGT 


1800 
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CTCTGGGACA AGTGTTTTTC CGTGATTGA' 




'GTG r 



•TCCCATGTG 



18bC 



AATGCACGTA TCTGTGTGTG TGCACGACTG CTTGTGTGAG CAGATCCCTA GTCGTGTCTG 



1920 



GGTGTGTATC GGTTGTGCAT GCATTTGTGT GCATCCTGTG TTTCTCTGAA ACTGTTAGGG 



1980 



CCATATGAAT 



TTCTAAAATC TATTCAGATT TTAGAAAGGT AATCTGGGGC CAGGCGTGGT 



2040 



GG 



GTCATGCC TGTAATCCCA GCACTTTGGA AGGCCGAGGT GGGCAGATCA CTTGAGGTCA 



2 100 



GGAGTTCAAG ACCAGCCTGG CCAACACGGT GAAACCCCGT CTCTACTAAA AGTACAAAAA 



2160 



TTAGCCAGGC GTGGAGCACG TGCGTGTAGT GCCAGCTACT TGGGAGQCTG AGGCAGAATC 



2220 



GCTTGAACCT GGGAGGCGGA GGTTGCAGTG AGCTGAGATT TGGCCACTGC ACTGCACTCC 



2280 



AGCCTGGGCA ACAGAGTGAG TACTCTGCCA AAAAAAAAAA AAAAAAAAA 



(2) INFORKATIGN FOR SEQ ID NO : 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
{C) STRA^,T)EDNESS : single 
{ D ) TOPOi^OGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

CACCTGGTGA TCACGTGGTC 



(2) INFORMATION FOR SEQ ID NO: 82: 

(l) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 0 base pairs 
(B} TYPE : nucleic acid 

( C ) STRANDEDNES S : s ing 1 e 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
GTAAGGCTCA AGTCATCTCC 2 0 



f2) INFOP^IATION' FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 



Glu Gly Cys Lys Gly 
1 5 
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C) IN'FCRMATION FOR SZQ ZZ NO: 84: 

(i; SECOTNCE CHARACTERISTICS: 
(A) LENGTH: 5 ar.ino acids 
(E) TYPE: ammo acid 
(Cj STRANDEDNESS : 
(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ 10 NO: 84 

Glu Gly Cys Lys Ala 
1 5 

(2) INFORMATION FOR SEQ ID NO : 85: 

(a: SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 ammo acids 

(B) TYPE: ammo acid 

(C) STPJV^EDNESS : 

(D) TOPOLOGY : linear 

[xi; SEQUENCE DESCRIPTION: SEQ ID NO; 85: 

Asp Gly Cys Lys Gly 
1 ' 5 



PCTaJS97/1603': 



(2) INrORyj^TION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
tD) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1 . . 36 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

GAC ACG TAC AGO GGC CCC CCC CCA GGG CCA GGC CCG 
Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro 



(2) INFOP^IATION FOR SEQ ID NO: 87; 

d; oEQUEr;CE CHARACTERISTICS: 
;A; LENGTH: 12 ammo acids 
iB] TYPE: amino acid 
( D J TOPOLOGY : 1 inear 

(11^ MOLECULE TYPE: protem 
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(XI) SEQUENCE DESCRIPTION: SEQ ID NO; 8"^: 

Asp Thr Tyr Ser Gly Pro Pro Pro Gly Pro Gly Pro 
1 5 13 



(2) INFORMATION FOR SEQ ID NO : 68: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STPJIlNDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ix) FEATURE : 

(A) NAME /KEY: CDS 
(B} LOCATION; 1 . . 36 

(xi) SEQL^NCE DESCRIPTION: SEQ ID NO: BS: 

GAC ACG TAC AGC GGC CCC CCC CCC AGG GCC AGO CCC 36 
Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro 
1 5 10 



(2) INFORMATION FOR SEQ ID NO : 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 amino acids 

(B) TYPE: ammo acid 
{ D ) TOPOLOGY : 1 inear 

{ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 89: 

Asp Thr Tyr Ser Gly Pro Pro Pro Arg Ala Arg Pro 
15 10 



(2) INFORMATION FOR SEQ ID NO: 90; 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
CATGAACCCC GAAGAGTGGT G 21 



(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS: 
{A; LENGTH: 2 0 base pairs 
(B) TYPE: nucleic acid 
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;Ci STTlAiTCEDNZSS : single 
iD) TOPOiLOGV: l:Lnear 

(XI' SEQUENCE DESCRIPTION: SEQ ID NO: 91. 

GCCTCCAGAC ACCTGTTACT 

(2) INFORMATION FOR SEQ ID NO: 92; 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
( D ) TOPOLOGY : 1 mear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

GGCGATCATG GCAAGTTAGA AG 

(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
fC) STP^\T)EDNESS : single 

(D) TOPOLOGY: linear 

(XL) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
TTGGTGAGAG TATGGAAGAC C 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
GGGGTTTGCT TGTGAAACTC C 

(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
(C; STRANDEDNESS : single 
( D ) T0P0LCK3Y : 1 inear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



21 



21 
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TTGGTGGGAA ACGGGCTTGG 2 0 

(2) INFORMATION FOP SEQ ID NO: 96: 

{1} SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
CTCCCACTAG TACCCTAACC -0 

(2) INFCRKATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
(3) TYPE: nucleic acid 

(C) STRATTDEDNESS : single 

( D) TOPOLOGY 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
GAGAGGGCAA AGGTCACTTC AG 22 

(2) INFORM/^TION FOR SEQ ID NO: 98: 

{i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 22 base pairs 

(E) TYPE . nucleic acid 

(C) STPJ^IDEDNESS : Single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
AGTGAAGGCT ACAGACCCTA TC 22 

(2) INFORiMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHJU^CTERISTICS : 
{A) LENGTH: 21 base pairs 
{B) TYPE : nucleic acid 
(C) STP-ArroEDNESS : single 
{D) TOPOLOGY . linear 

{xi} SEQUEl<:CE DESCRIPTION: SEQ ID NO: 99: 

TTCCTGGGTC TGTGTACTTG C 21 

(2) INFORMATION FOR SEQ ID NO: 100: 



BNSDOCID -cWO 961 125^A1^I_> 



wo 98/11254 



PCT/US97/16037 



218 

(1) SEQUENCE CHARACTERISTICS: 
iA) LENGTH. 21 base pairs 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS : Single 
ID) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 100 

TGTGTTTTGG GCCAAGCACC A 



(2) INFORMATION FOR SEQ ID NO : 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

txi) SEQUENCE DESCRIPTION: SEQ ID NO: 101 
AACCAGATAA GATCCGTGGC 



(2) INFORMATION FOR SEQ ID NO : 102: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRAxNDEDNESS : single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
AACCAGACTC ACAGCCTGAA CC 



(2) INFORMATION FOR SEQ ID NO: 103: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
{ D ) TOPOLOGY : 1 inea r 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

TCACAGGGCA ATGGCTGAAC 



(2) INFORMATION FOR SEQ ID NO: 104: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 
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(XI) SEQUENCE DESCRIPTION: Si^Q ID NO: 104: 
TGCCGAGTCA TTGTTCCAGG 

(2) INFORMATION FOR SEQ ID NO: 105: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base paxrs 

(B) TYPE: nucleic acid 
{O STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: lOS: 
CCTCTTATCT TATCAGCTCC AG 

(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CKAJ7ACTER I ST I CS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
iC) STP.ANDEDNESS : single 
{D ) TOPOLOGY : 1 mear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO: 106 

CTGCTCTTTG TGGTCCAAGT CC 



PCT/US97/16037 



(2) INFOPJ^iATION FOR SEQ ID NO : 107: 



(i) SEQUENCE CHARACTERISTICS: 
{A;- LENGTH: 21 base pairs 

(B) TYPE-, nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 



GAGTTTGAAG GAGACCTACA G 



(2) INFORMATION FOR SEQ ID NO : 108: 

(1) SEQUENCE CHAP-ACTER I ST I CS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRAT^DEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 108 



ATCCACCTCT CCTTATCCCA G 



wo 98/11254 

PCT/US97/I6037 

220 



(2) IKFORMATICN FOR SE^ ID NO: 



.09 : 



SEQ^SEUCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
(CJ STRANDEDNESS : single 
i^) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 
ACTTCCGAGA AAGTTCAGAC C 

12) INFORMATION FOR SEQ ID NO: 110: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(r)] TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 
TTTGCCTGTG TATGCACCTT G 

(2) INFORMATION FOR SEQ ID NO: 111: 

(i) SEQL^NCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 111: 
GCCGAGTCCA TGCTTGCCAC 

(2) INFORMATION FOR SEQ ID NO : 112: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 
CTTTGCTGGT TGAGTTGGGC 

(2) INFORMATION FOR SEQ ID NO : 113: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 1 base pairs 



21 



20 



20 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(XI ; SEQUENCE DESCRIPTION: SEQ ID NO: 113: 
TTCCATGACA GCTGCCCAGA G 



(2) INFORMATION -FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUE:NCE DESCRIPTION: SEQ ID NO: 114: 
TAAAGGTTGG AGCCCCTCTG 



(2) INFOPJ^TION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 
fC) STRANDEDNESS: single 
(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 115: 

TTGTAAGGTG ACCCCATCAG 



(2) INFORMATION FOR SEQ ID NO ; 116: 

(i) SEQL^NCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUEI^CE DESCRIPTION: SEQ ID NO: 116: 
TTGGTGATGT CCAGAAGTCC 



(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(B} TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 117: 
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CAGAATGTGT CA3AGTTJGC ~> f- 

(2) :nformp.t:on for SriQ id no: iib: 

I*) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO; 116: 
CTCCCTCCTG TTCTTAAGTG 2C 

(2) INFORMATION FOR SEQ ID NO: 119: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(B1 TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO ; 119: 
CTGGACTCCC AGTTCAGTCA 2 0 

(2) INFORMATION FOR SEQ ID NO: 12 0: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY; linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 0: 
CAAGGATCCA GAAGATTGGC 2 0 

(2) INFORMATION FOR SEQ ID NO: 121: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 
CGTCCTCTGG GAAGATCTGC 



20 



(2) INFOPJ^ATION FOR SEQ ID NO: 122 
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ix) SEQUENCE CHARACTERI ST^'CS : 
(A; LENGTH: 24 base pairs 
(3) TYPE: nucleic acid 

(C) STRAKDEDNESS : single 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 
GCAACAGAGC AAGACTCCAT CTCA 2 4 



(2) INFORMATION FOR SEQ ID NO : 12 3; 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 22 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(XI ; SEQUENCE DESCRIPTION: SEQ ID NO: 123: 
GAGTTTAATG GAAGAACTAA CC 2 2 



(2) INFORMATION FOR SEQ ID NO: 124: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANBEDNESS : single 
{ D ) TOPOLOGY : linear 

(Xl) SEQUENCE DESCRIPTION: SEQ 'ID NO: 124: 

CCTCATGGAG AAACATCCTA AGT 2 3 



(2) INFORMATION FOR SEQ ID NO : 12 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

{ D ) TOPOLOGY : 1 inear 

[xi) SEQUENCE DESCRIPTION: SEQ ID NO: 125: 
AGGGAGTGCA CGGCTGAGCT CCTG 2 4 



(2) INFORMATION FOR SEQ ID NO: 126: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: €254 base pairs 
(3) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
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i^. TOP3LOGV : linear 

IX/ FEATURE; 

(A; NA-VE/KEV : :rcd2.f ied__base 

;bj LccATior; : 12S7 . .4273 

(D) OTHER INFORMATION: /note* - A or G or C or T' 

ixi, SEQUENCE DESCRIPTION: SEQ ID NO: 126: 
AGCCA3CACT GTTCTTGGCA CATGGTAATC TTAACATATT TTTTCCTACA GGGAGG 
GTGTGAGGCC GGGAGTGGGG TGGAAGGGTC CCAAAATGGA TGGAAGGGCC CCAAAATGGC 
CGTGAGCATC CTCTGCCCTT GAGAAGAGCT AGCCCAGCTG TCTAGAGCTC CCTGCTGCTG 18 0 

CCGCTCTCGT AAGCAGCAAG CATTTTTGGC TCTCCTGTCT CAGCATGATG CCCCTACAAG 
GTTCTTTCGG GGGTGGGACC CAACGCTGCT CTCCTGATGG CCTCCCTGGC TCCCAGCACC 
TTCCATCCCA GCTGCTCAGG GCCCCZ'CACC TGCGCCTCCC CCACCCTCCC CTCTGCCCAC 
TCCCATGGCA GGCCATAGCT CCCTGTCCCT CTCCGCTGCC ATGAGGCCTG CACTTTGCAG 
GGCTGAAGTC CAAAGTTCAG TCCCTTCGGT AAGCACACGG ATAAATATGA ACCTTGGAGA 
ATTTCCGCAG CTCCAATGTA AACAGAACAG 3CAGG0GCCC TGATTCACGG GCCGCTGGGG 
CCAGGGTTGG GGGTTGGGGG TGCCCACAGG GCTTGGCTAG TGGGGTTTTG GGGGGGCAGT 
GGGTGCAAGG AGTTTGGTTT GTGrCTGCCG GCCGGCAGGC AAACGCAACC CACGCGGTGG 
GGGAGGCGGC TAGCGTGGTG GACCCGGGCC GCGTGGCCCT GTGGCAGCCG AGCCATGGTT 
TCTAAACTGA GCCAGCTGCA GACGGAGCTC CTGGCGGCCC TGCTCGAGTC AGGGCTGAGC 76 0 

AAAGAGGCAC TGATCCAGGC ACTGGGTGAG CCGGGGCCCT ACCTCCTGGC TGGAGAAGGC 
CCCCTGGACA AGGGGGAGTC CTGCGGCGGC GGTCGAGGGG AGGTGGCTGA GCTGCCCAAT 
GGGCTGGGGG AGACTCGGGG CTCCGAGGAC GAGACGGACG ACGATGGGGA AGACTTCACG 96 0 

CCACCCATCC TCAAAGAGCT GGAGAACCTC AGCCCTGAGG AGGCGGCCCA CCAGAAAGCC 102 0 

GTGGTGGAGA CCCTTCTGCA GTAAGGAGCC CTGCCCCGTC CCCGCTCCCA GGAGAGCCTA 10 8 0 

GAGGGGCCCC CCTCAGCTCC TAACGAGCCC CCCTTCTGAG TTGAGTCCCC ATGACCTTCA 
GCCTTTAGCC TAGTTGCTGG GAAGGGGGAC AGGGCCCATG AGAGCCCAGG GGTCCTTGCT 
TGGAGGTTTG AGCCTCCAGC CCCTGAACTG CTCCTCTGCA GAGTCCCAAA TGCCATGAGC 126 0 

CCAGGCCTTT AGCCCAGTCC TTGGGCNAGG GGGACATTTC CCAGGGGGTC CAAGATGGGA 
GAAAAAGCAG TGAATTCACA ACTCAAATGC CCACCCACCC ATCCATCCAT CCGTCCATCC 
ACCCATTCAT CCATTCATCC ATTCACCCAT CCATCCATCC ACATATCTTC ATCTGTGTTG 



60 
120 



240 
300 
36C 
42 0 
480 
54 0 
600 
660 
720 



840 
900 



1140 
1200 



1320 
1380 
1440 



wo 98/1 1254 PCTAJS97/16037 

225 

TGTGTCTGTG TATCCATGTT TCTAAACCTT TATCTGTTCC AGTGTCTGTA TCCATAGGCC 15 00 

TGTGTCCACG TTTGTCATGT GTGTGCGTCN ACAAGTCTCT GTCCTCATGA CCATGTGTCT 156 0 

GTGTCCCTGT GTCCTGGCAT AAATGACCAT ACCTCACCGT CCCTGAGTC7 ATGTGTAGGC 16 2 0 

CCCTGGGCTC CATAACTGCT TTGATGCACA GTCCCCACCG TGAGAGTTGA CAAGGTTCCA 16 8 0 

GCACCCAGGA CCGCAGCCCC ACCTATGGGG AGAGACAGCC CTTGCTGAGC AGATCCCGTC 174 0 

CTTGCCCTCT CCCAGGGAGG ACCCGTGGCG TGTGGCGAAG ATGGTCAAGT CCTACCTGCA 18 00 

GCAGCACAAC ATCCCACAGC GGGAGGTGG" CGATACCACT GGCCTCAACC AGTCCCACCT 186 0 

GTCCCAACAC CTCAACAAGG GCACTCCCAT GAAGACGCAG AAGCGGGCCG CCCTGTACAC 192 0 

CTGGTACGTC CGCAAGCAGC GAGAGGTGGC GCAGCGTAAG TAATGACCCT ACCCCGCATC 1980 

TTCCCTGGGA GGGCCCAGGA CTCTCCCCTA ACTCATAGGT GGGGGCTGGA AGCTTCACCA 2 04 0 

TCCCCATTAG ACAGACAGGT AGATGGAAAG GAAGTCAGTG GGATTCAACC TGCATTTATT 2100 

ACCTATTCTG CGCCAGGCAC TCTGTGGGAC GGGAGTANAC TTGGTCCTGA ACATCCAAAG 216 0 

ATGA^.TGAAA TGGGTCCCTG CTTTCTTTTT CTTTTTTTAG ATACGTGACT CTGGAAAAAT 2 22 0 

ATGTAAGCTC TCTGAGCCTC AGCTTCTTCA TCTGTACAAT GGGGATAGTA AATGTGCCAA 22 8 0 

ATCAGAACAA ATGCTAATGC TTACCTGCAG TCTTGTACTG AGAAGGATGG TGAGATCATA 2 34 0 

TCTTGGGTTG GTAGGAAAGC ATTCAGGGAT TGATTAGTGA TGTTTGCCTT GAACACAGGT 24 0 0 

TAAGAAAGTG ATGGCATGTG TGCTGTGTGT TTGTCATCAG TAGATTAGA7 GATTTCTAAG 24 6 0 

TTCTAGCTGT AAGCTCCTCT GGTTCAGCGC CATGGCAATG AGAAAGAATC AAGGGCAAGG 2 52 0 

TCAGGGGAAT GGACGAGGGA AGGTGAGAGT GGCCAGTACC CCACTCACGG CTTTCTGTGC 2 5 80 

CTGCAGAGTT CAGCCATGCA GGGCAGGGAG GGCTGATTGA AGAGCCCACA GGTGATGAGC 26 4 0 

TACCAACCAA GAAGGGGCGG AGGAACCGTT TCAAGTGGGG CCCAGCATCC CAGCAGATCC 2 7 00 

TGTTCCAGGC CTATGAGAGG CAGAAGAACC CTAGCAAGGA GGAGCGAGAG GTACAACGGC 2760 

GGGCGGG AAA CAGTGCTGGT TTGGTCTGGG CTGCGGCAAG GCCAGGGGAA GGGGAAGGTG 28 20 

ACTCTAGGTC CTGTAAAAGG CTGTCCAGTT GCCGAGAACT CCTGATATTG GCTTAGCCTG 28 80 

GCCCAGAAAA TTGAGAATAC TTGAACCTAA GCCCATTCCT CGCAGCCCCC CTGCACCNTG 2 94 0 

GACACCAAGC AACCCCTTCC ATGGATGCTC ACCCAATTCG ATTCTCTCTA CAATCCTATG 3 0 00 

GCTCTTTTGC TCAGTTTATG AATGGAGAGA CTGAGGTCAG ACAGACTGTC AATTGCCCAA 3 06 0 

GGTCACACAG CAGACCTGGC ATTGGAACCC AGATCTGCCA GCCTCAAACC CTCCGGCAGA 312 0 

GNTCAGCTTC TCAGAACCCT CCCCTTCATG CCCAGGACAG GGTTCCTCTG AGCCTGGCCT 3180 
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GGAGGCTCAT GGGTGGCTAT TTCTGCAGGG CGGAA7GGAT CCAGAGAGGG CTGrGGCCAl 32 4 0 

CACAGGCACA GGGGCTGGGC TrCAACCTCG TCAGGGAGGT 3CGTGTCTAC AACTGGTTTG 3 3 DC 

CCAACC3GCG CAAAGAAGAA GCCTTCCGGC ACAAGGTGGC CATGGAGACG TACAGCGGGC 3 36C 

CCCCCCCA3G GCCAGGCCCG GGACCTGCGC TGCCCGCTCA CAGCTCCCCT GGCCTGCCTC 3420 

CACCTGCCCT CTCCCCCAGT AAGGTCCACG GTAAGTGGTA TGTGGGGACA AGGGACACGT 3 4 80 

GGGAAGGTGG GAGGGTTGGG GAGGACTGTC CCATTGACAG CAGTCACCTA AACCTCTTTG 3 54 0 

CACGTCAGTT TGGTTCCATT CGCAGCTGAC CCAGGGATTG GCAAAAGGTA GAAACAAAGG 36 0 0 

CAGATTTGCT GGCTGCATAA AGGCAGACAG GCAGATGGCC TAAGCAAACC AATGGAGTTT 3 66 0 

GAAGTGCTGA GGGCTGTGGA GGCAGGGGAG GGCAGGGAAG TGGGGTGCTG AGGCAGGACA 3 72 0 

CTGCTTCCCT CTCCAGGTGT GCGCTATGGA CAGCCTGCGA CCAGTGAGAC TGCAGAAGTA 3780 

CCCTCAAGCA GCGGCGGTCC CTTAGTGACA GTGTCTACAC CCC7CCACCA AGTGTCCCCC 3 84 0 

ACGGGCCTGG AGCGCAGCCA CAGCCTGCTG AGTACAGAAG CCAAGCTGGT GAGTGTCCTT 3 90 0 

GCTTGTAAGG AAAAGCCAAC CTCATCTTTC CTTGGCAGGG AGATTCTGGA GCAGTCCCTA 3 96 0 
GGGAGGCCCr GTGGGGACCC CGGCCCCCCG GACACAGCTT GGCTTCCCCT CGTAGGTCTC ■ 4 02 0 

AGCAGCTGGG GGCCCCCTCC CCCCTGTCAG CACCCTGACA GCACTGCACA GCTTGGAGCA 4 03 0 

GAGATCCCCA GGCCTCAACC AGCAGCCCGA GAACCTCATC ATGGCCTCAC TTCCTGGGGT 414 0 

CATGACCATC GGGCCTGGTG AGCCTGCCTC CGTGGGTCCT ACGTTCACCA ACACAGGTGC 42 00 

CTCCACCCTG GTCATCGGTA AGCTGGTGGG GATGGGTGGG CACCTGGGTG GGAGGCTCAT 4260 

GGGGCAACCG CANAATCCAG GAGCTGGAAA AGCCACTGGG ACTCATTCAT TCATTCATTC 4 320 

ATTCATACAA CATGTTAGGA GAGGGGAGCA GAGAACTGAC CCCATGGCCT TTGCACTGCT 4380 

GTGGTACCCC AGGGCTCCAG GGAACCGCAG TTTGACAACT TTTGAACAAG TCACCGCTTG 444 0 

CTTTTCCCAT TAGCTTAGAC AAAGAGCTAA AGGCTCAGAG AGGGGGAATG ACTTGCCAGA 4 500 

GCCACTTAAA TTAGTGGCAG GTCCCAGTGG AGGGCTGTTT CCTGACCACC TTGCCCCTTC 4 56 0 

TTCCAAACCA CGGGCTCTGG GAAGGAGAGG TGGTGCCCTT GGGAGGTCTT GGGCAGGGGT 4620 

GGGATATAAC TGGGGGGCCC AGCTGATTCC CTCCCCTTCC ACTCCAGGCC TGGCCTCCAC 4 66 0 

GCAGGCACAG AGTGTGCCGG TCATCAACAG CATGGGCAGC AGCCTGACCA CCCTGCAGCC 4 74 0 

CGTCCAGTTC TCCCAGCCGC TGCACCCCTC CTACCAGCAG CCGCTCATGC CACCTGTGCA 4 8 00 

GAGCCATGTG ACGCAGAACC CCTTCATGGC CACCATGGCT CAGCTGCAGA GCCCCCACGG 4860 
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TGAGCACCCT GTGCCCCACA CAGCAGGAGA TGATGATAGA GGTTGGCTGT CAATGGATGC 4920 
AGGGGAAAGG GGTGCCTGGC AGGCATTGCA GTCTGCATGT GTCTCTGGGA CAAGTGTGTT 4 980 

TCCGTGATTG AGGGTGTCTG CAGGCCAGTG TGTTCCCATG TGAATGCACG TATCTGTGTG S04 0 

TGTGCACGAC TGCTTGTGTG AGCAGATCCC TAGTGCGTGT CTGGGTGTGT ATCGGTTGTG SlOO 
CATGCATTTG TGTGCATGCC TGTGTTTCTC TGAAACTCTT AGGGCCATAT GAATTTCTAA S160 
AATCTATTCA GACCAGTTTT GAAAATCAGC CTTGGATCTC CAACTGCTGC CCAGTCTGGC 5220 
TGTTCAGCAG GCCCCATGCC CCCCTTTCCC CAGTCTTGAG GCGTGGGACT AGGGCTGTCA 5290 
GGCACGTTTG CCACGTCTGC CCCTCTCTCC CCTGCGGCCA GCCCTCTACA GCCACAAGCC 5340 
CGAGGTGGCC CAGTACACCC ACACGGGCCT GCTCCCGCAG ACTATGCTCA TCACCGACAC 5400 
CACCAACCTG AGCGCCCTGG CCAGCCTCAC GCCCACCAAG CAGGTAAGGT CCAGGCCTGC 5460 
TGGCCCTCCC TCGGCCTGTG ACAGAGCCCC TCACCCCCAC ATCCCCCGGG CTCAGGAGGC 5S20 
TGCTCTGCTC CCCCAGGTCT TCACCTCAGA CACTGAGGCC TCCAGTGAGT CCGGGCTTCA 5580 
CACGCCGGCA TCTCAGGCCA CCACCCTCCA CGTCCCCAGC CAGGACCCTG CCGGCATCCA 5640 
GCACCTGCAG CCGGCCCACC GGCTCAGCGC CAGCCCCACA GGTGAGAGGC CCTGGCTCCA 5700 
CCCCCTCCCT TACTGTCCCT GCCCCCTTCC ATGTTGGTCC CACCCCTTCT GTTGCTGTCC 5760 
GTCACTGTGG GGCTGTGCAT GCAGCAGGCC TAGGGCTGCT GTGAGGAAGC ACTGGCAGGC 5820 
GTGGAAGGGT GGGGTGGCTT CCATGAATCC AGTGTTCACA GTAAGATGTA CTCAGGCCAG 
TCCATGGGCG GCCGTGGACC CTGGCTGGGA GGCTCCCTTT GTTAAGAACC GAGGGTAGAG 
GTGTGACTTT GGGGTTCCTG TTATGTGCTG TGATCCAGGA GGTGTGGCCC TGCCTCCCCA 
TCCTGAGTAC CCCTAGGGAC AGGCAGGTGG GGTGGGTGTG GGTGCCTGGT GGGTGGCTAG 
CAGCCTTGTT TGCCTCTGCA GTGTCCTCCA GCAGCCTGGT GCTGTACCAG AGCTCAGACT 
CCAGCAATGG CCAGAGCCAC CTGCTGCCAT CCAACCACAG CGTCATCGAG ACCTTCATCT 
CCACCCAGAT GGCCTCTTCC TCCCAGTAAC CACGGCACCT GGGCCCTGGG GCCTGTACTG 
CCTGCTTGGG GGGT 

(2) INFOR>IATION- FOR SEO 'I> NO: 127; 

(i) SEQUENCE CHARACTERISTICS : 

;a; LENGTH: 631 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 
;D) TOPOLOGY: linear 
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'x^> SEQUENCE DESCP.r PTION : SEQ ID no ^ 12 7- 
Met Va: ser Lys Leu Ser Gin Leu G - -^^^ . 

1 c ^eu Leu A^a Ala Leu 

15 

-eu Glu Ser Glv Leu S-^r : vc= n .. ^ , 

2C' ''-^ ^'^^ Leu Gly Glu 

30 



Pro Gly Pro Tv 



^yr Leu Leu Ala Gly Glu Giy ,ro Leu Asp Lys Cly Glu 



^° 45 



Ser Cys Gly Gl 



5C 



y Gly Arg Gly Glu Leu Ala Glu Leu Pro Asn Glv : 



55 



60 

Arg Gly Ser Glu Asp 



Gly Leu 



Oly Glu Thr 

Phe Thr Pro Pro j^^u 

i^eu r^i,, ^eu Giu Asn Leu Ser 



lie Leu Lys Glu 

85 "^^^ ser Pro Glu Glu 

^° 95 



Ala Ala His Gin Lys Ala Val Val Glu T'-,- r 

100 lie ^^'-^ Asp Pro 

110 

Trp Arg Val Ala Lys Met Val Lys Ser Tv. , 

115 ^^"^ ^^'^ Gin Has Asr. He 

125 



Pro Gin Arg Glu Val Val Asp Thr Th>- Glv • ^ 

130 1^ ^■^J' -s-i Asn Gin Ser H 

^ 140 

O.n „.3 , 

160 



«. T,. ... , 



165 ^ ^"'^ Ala Gin Gin 

175 



T., „.3 

190 

..o ... 



S., 

220 

„^ .^^^ ^^^^ 

cys .1. « 3„ ^ 

250 

255 



Asn ^eu Val Thr Glu Val Arg Val Ty- Asr Tr^ 

260 ^^"^ ^""P Asn Arg Arg 

270 

=i„ p., 

^^'^ 285 

Pro Pro Pro Gly Pro Gly p-o ru- n 

P.o Glv Pro Ala Leu Pro Ala His Ser Ser 



wo 98/11254 PCT/US97/16037 

229 

29 0 295 300 

Pro Gly Leu Pro Pro Pro Ala Leu Ser Pro Ser Ly3 Val Kis Gly Va 1 
305 310 315 320 

Arg Tyr Gly Gin Pro Ala Thr Ser Glu Thr Ala Glu Val Pro Ser Ser 
325 330 335 

Ser Gly Gly Pro Leu Val Thr Val Ser Thr Pro Leu His Gin Val Ser 
340 345 350 

Pro Thr Gly Leu Glu Pro Ser His Ser Leu Leu Ser Thr Glu Ala Lys 
355 360 365 

Leu Val Ser Ala Ala Gly Gly Pro Leu Pro Pro Val Ser Thr Leu Thr 
370 375 380 

Ala Leu Hrs Ser Leu Glu Gin Thr Ser Pro Gly Leu Asn Gin Gin Pro 
365 390 395 400 

Gin Asn Leu lie Met Ala Ser Leu Pro Gly Val Met Thr lie Gly Pro 
405 410 415 

Gly Glu Pro Ala Ser Leu Gly Pro Thr Phe Thr Asn Thr Gly Ala Ser 
420 425 430 

Thr Leu Val lie Gly Leu Ala Ser Thr Gin Ala Gin Ser Val Pro Val 
435 440 445 

lie Asn Ser Met Gly Ser Ser Leu Thr Thr Leu Gin Pro Val Gin Phe 
450 455 460 

Ser Gin Pro Leu His Pro Ser Tyr Gin Gin Pro Leu Met Pro Pro Val 
465 470 ' 475 4 8 0 

Gin Ser H:ls Val Thr Gin Asn Pro Phe Met Ala Thr Met Ala Gin Leu 
4B5 490 495 

Gin Ser Pro His Ala Leu Tyr Ser His Lys Pro Glu Val Ala Gin Tyr 
500 505 510 

Thr His Thr Gly Leu Leu Pro Gin Thr Met Leu lie Thr Asp Thr Thr 
515 520 525 

Asn Leu Ser Ala Leu Ala Ser Leu Thr Pro Thr Lys Gin Val Phe Thr 
530 535 540 

Ser Asp Thr Glu Ala Ser Ser Glu Ser Gly Leu His Thr Pro Ala Ser 
545 550 555 560 

Gin Ala Thr Thr Leu His Val Pro Ser Gin Asp Pro Ala Gly lie Gin 
565 570 575 

His Leu Gin Pro Ala His Arg Leu Ser Ala Ser Pro Thr Val Ser Ser 
580 585 59C 

Ser Ser Leu Val Leu Tyr Gin Ser Ser Asp Ser Ser Asn Gly Gin Ser 
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His Leu Leu Pro Ser Asn His Ser Val He Glu Thr Pne He Ser Thr 

615 62 0 

Gin Met Ala Ser Ser Ser Gin 
625 630 

(2) INFORMATION FOR SEQ ID NO: 128: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 33 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 



CATGAACCCC 


GAAGAGTAGT 


GTCTTCTCTC 


TGGACTAAAG 


CGGAACTGAG 


AACCGGTGGA 


60 


AAAGCCCCGC 


GCCTAGGCTG 


CAAGGCACTG 


GCTTAACAAG 


TCCAAAGGTT 


AGGTGAAGTT 


12C 


TGGCTGATAA 


GCAGAACCAG 


TAAAAGAAGG 


TCTCTAGCCC 


CCCAGCGTGA 


GTACAATGGA 


180 


CCCTGGCAAA 


GCCCCGCTCC 


CGGCCCAGGT 


CTTCTGCTCT 


CCAGGTCTGC 


CCCTCCGGCT 


240 


CTCCCTCTCT 


CCGGGTTTCC 


CCCTCCCCAC 


CATCATTTGC 


ATCCAGCCGA 


AAGCTGGGC2 


300 


CTTCCCACTA 


ATTTGCATAT 


CTTATATGGC 


CTAATGGTGG 


CGATCATGGC 


AAGTTAGAAG 


360 


TTTTCTGACT 


CCTTTCGGAG 


GAGCCTCCGG 


GACCCCGGGG 


AGTAACAGGT 




•i Z (J 


TGAAGGGTGG 


AGGGGTTCCT 


GGATTTGGGG 


TTTGCTTGTG 


AAACTCCCCT 


CCACCCTCCT 


480 


CTCTCGCACC 


CACCCACCCC 


CTCACCCCCT 


TCTTTTTCCG 


TCCTTGGAAA 


ATGGTGTCCA 


540 


AGCTCACGTC 


GCTCCAGCAA 


GAACTCCTGA 


GCGCCCTGCT 


GAGCTCCGGG 


GTCACCAAGG 


600 


AGGTGCTGGT 


TCAGGCCTTG 


GAGGAGTTGC 


TGCCATCCCC 


GAACTTCGGG 


GTGAAGCTGG 


660 


AGACGCTGCC 


CCTGTCCCCT 


GGCAGCGGGG 


CCGAGCCCGA 


CACCAAGCCG 


GTCTTCCATA 


720 


CTCTCACCAA 


CGGCCACGCC 


AAGGGCCGCT 


TGTCCGGCGA 


CGAGGGCTCC 


GAGGACGGCG 


780 


ACGACTATGA 


CACACCTCCC 


ATCCTCAAGG 


AGCTGCAGGC 


GCTCAACACC 


GAGGAGGCGG 


840 


CGGAGCAGCG 


GGCGGAGGTG 


GACCGGATGC 


TCAGGTAGGC 


GCAGAGCCAG 


GTGGAGGGGA 


900 


CCCACCCGAA 


CCCCTGGAGC 


CCCGGCCCCG 


GGCCTGAGTG 


ACACTGCGCC 


CGACCACACT 


960 


CGCCAAGCCC 


GTTTCCCACC 


AAAAAATTCC 


CCCGGGGGGC 


GCTCTGCTTC 


TCTCCCAACA 


1020 


CCCGGACCCT 


TCCCAATCCC 


TTAGCGGGAC 


AACCCTGCGG 


CCCACCGGGC 


TTCTTCTCCC 


1080 


CAGGCCCAGG 


CCATCGTCCT 


CAGAAGAAAG 


GGATGAGGTG 


TACCGTACAG 


GGGCAGTCAC 


1140 
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CTTCTCCTCT GTTTAGCTTC CATTTTGGCC TCATGTCTAC CCCAAAGTTG TAGCTTAGAT 12 00 

GGGGGGAAAA TTCAGAATTT TGCATAGACC ATAGGTAGGA CCCCCTAGAA AAAGAATGTT 126 0 

TCTCCCCAGA TGTCTCCCAC TAGTACCCTA AGCATCTGCT TGTCTGTCTA GTGAGGACCC 13 2 C 

TTGGAGGGCT GCTAAAATGA TCAAGGGTTA CATGCAGCAA CACAACATCC CCCAGAGGGA 138 0 

GGTGGTCGAT GTCACCGGCC TGAACCAGTC GCACCTCTCC CAGCATCTCA ACAAGGGCAC 144 0 

CCCTATGAAG ACCCAGAAGC GTGCCGCTCT GTACACCTGG TACGTCAGAA AGCAACGAGA 150 0 

GATCCTCCGA CGTAAGTGTT TTCATCCTGC CTCTGCCTCA ACCTGAAGTG ACCTTTGCCC 156 0 

TCTCACCCCA TTGGCTGCCT CAGTTTCCCT TTCATCGACA AGGCCTTGTG AGCACTTGGC 162 0 

AGATATGAGG AAGGTGGCAA GTAGATTTGG CCTTGGTGGT TGCTGTACAA TGGATTGGCT 1680 

TCTGTCATGT TCTTCAGTCA CAGCCCCCTT GCTACCCAGC CAGTTGCTCT GAGGAGCCTG 174 0 

TCAGTGTGAT TGAGCTCACC CACTTGACAT CAAATACAGG AGTTCAGGAT GCAGAGTGTT 18 00 

GCTTCATCTC TGAAGGCCAG TGAGCCAAAG GGGAAAAAAT AATAATTTTC TTAAAACTAT 1860 

AGCTGGCTAT GTTTGAGCTC CTTCAAAGAA AGGAAAAGGG TGGCTTTGCT GGAGCAACTG 1920 

AGGTGGGCAG TAAGGGCCTG TGCTGAGGGC TCCCCATCTC CAGCTCCACA TGCAGTGAGA 1980 

GAAGGTTGCA AAGCTTAGTT AGACGAGGGG AA7AAACCTG TCTTCGTCCG TTGTCTGTCT 2 04 0 

GTCTGTCTGT CTGTCTGCTG AGTGAAGGCT ACAGACCCTA TCAAATCTAC TCCTTTCTCT 2100 

TTTCAGAATT CAACCAGACA GTCCAGAGTT CTGGAAATAT GACAGACAAA AGCAGTCAGG 2 160 

ATCAGCTGCT GTTTCTCTTT CCAGAGTTCA GTCAACAGAG CCATGGGCCT GGGCAGTCCG 2 220 

ATGATGCCTG CTCTGAGCCC ACCAACAAGA AGATGCGCCG CAACCGGTTC AAATGGGGGC 22 80 

CCGCGTCCCA GCAAATCTTG TACCAGGCCT ACGATCGGCA AAAGAACCCC AGCAAGGAAG 2 34 0 

AGAGAGAGGC CTTAGTGGAG GAATGCAACA GGTAACACCA CCAGAAGCTC AGGTGGGCAG 2 4 00 

GTGGGCAAGT ACACAGACCC AGGAACCCTC CCCTCGGTCC TGGGATATTG AGACACTAGT 24 6 0 

TATACAGATA AGTGTGGCTA AATCAGAGCT TCTCAAAGTA TGTTCCACAG TGATTGTGTG 2 52 0 

TTTTGGGCCA AGCACCAACA AGTCCCCCCG CCCCCCTTCA CTCACCATCT CCCCTCCATC 256 0 

CATTCCCAGG GCAGAATGTT TGCAGCGAGG GGTGTCCCCC TCCAAAGCCC ACGGCCTGGG 2640 

CTCCAACTTG GTCACTGAGG TCCGTGTCTA CAACTGGTTT GCAAACCGCA GGAAGGAGGA 2 7 00 

GGCATTCCGG CAAAAGCTGG CCATGGACGC CTATAGGTCC AACCAGACTC ACAGCCTGAA 2 76 0 

CCCTCTGCTC TCCCACGGCT CCCCCCACCA CCAGCCCAGC TCCTCTCCTC CAAACAAGCT 2 820 

GTCAGGTAAG CAAAGGTTGG GCCTCACTGC CTCGGCAACC 2AACCATCCT GGTTCTTGCC 28 80 
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ioo...^ o^o_._AG AGGAGCAAAC GC'^'-t'tga-- — ^ ^ 

^^^...o^^^ TAGG 2=^4" 

a--«..c..T c.™cc.=. A....crc.c ct==^.., .CT=r=cc.. t=»....^, J, 

C^==C.CT. 3.c.X«C... ^^^^^ ^^^^ 

GGC3CTTACA TTCTAGAATT iAATAGA=AA CATCrrATA- -TA-C—- 
CGATATTTCT TGTGGGTGGA CAGGGGAGGA GAAAGCAACT TTAT7TTCTT ATTACCCACC 
CTTGAAAACA AGAGGTGCCG AGrCArTGTT CCAGGACCCT OGTGOCAC.A A.GTTCCC.A 
CTGGGTXTGT OrTGTTTTGC AGGAG.GCGC TACAGCCAGC AGSCAAACAA TGAGATCACT 
TCCTCCTCAA CAATCAGTCA CCATGGCAAC AGCGCCATGG .GACCAGCCA GTCGGX.TTA 
CAGCAAGTCT CCCCAGCCAG CCTGGACCCA OGCCACAATC TCCTC.CACC XGATGGT;^ 
A.OG.GAG.A CACC.GGGCC A.XGTCGC.C XGGAGCTGAX AACATAACAG GCAAAACAAA 
CCAACXXCT CACAAGGCCT GCCXCAAACA AXGAACCAXX GXAGCCCCAX AGGGGAAAAX 
CGCGCXGXC CACAGXCGGA AAGGAGAGGX AGXCCXGCXG ACCCACCCX. XGGCGGGXAG 
AAAACCCAAA GTGAXGGGAX XACAGGOGXG AAGCACCAXG CCCAGCCAAX AATXGXXATX 

GAGTGAAXGA AGGAATGAAT XXGAGAACTA GXCATGCCAA r-^r^r. 

^ILATG^CA^A GuAATCGCXA AGXCACATCG 

TGT^GGAAAC XGCXCXXXGX GGXCCAAGXC CACCCAXGXX XCXCXTGXXT rXXXCXCXCC 
ATCAGAXCXC AGXCXCA3GA GGAGGXXTGC CCCCAGXCAG CACCXXGACG AATAXC^A^A 
CCCXCXCCCA CCAXAAXCCC CAGCAAXCXC A^CCXCAX CAXGACACCC CXCXCXGGAG 
TCATOGCAAT TGCACAAAGT AAGXXCXAXT CTXGGXXGGA AAACCXGGGG GCAGGGACAA 

GAAGAATGGG AAGCAAATTA Zi^r'^nn-r^r.j, 

^AAATTA ATG.GG.oAA AAAXAACXGX AGGXCTCCTX CAAACTCACC 

CACAACTAGX AAATTXGGXX XAACXXCTXX AGTTXCXCAX CXOXCXCCTX AAAXCCAAXA 

TT^GGAXXGX XXAGCCXAAA ACAAGAAAAA ATTGTGGAAX GGAXXX3GAX CCXGGTCACA 

GXXXAGCAGC XGXGCAXCCX GGGXCAAAXC ATXGAACCXA XGACXCTGGG AGACXCXCAG 

GCTXXAAXCA GATCXGTTTA ATGCCCAXCT CCAA-C-Ar^A ^^r^-r. 

ui_AA>.>.^ACA AV..CATTGXG GAACTXGAGC 

AAGTAAATXA ATAXCTCCAA GXCXrCGTTX C-TTA-Ar— r-^^^r-r. 

^.TXA.AC. G.CXCCCAXG GAATCTCCTA 

TGXAACAGGC TCAGCCCGGX GACXGGGACA T"rAr— 00 

^^^ii^^^ACA X.GAG^GGGG GCTCAAATGA XGGCAXCCAX 

CCACCTCXCC XTAXCCCAGG AGCTGTCTGT GX-XXTT-"- -^^r-^,^ 

^.l^TXXT.'^T CXTGCXCCCA CAGGCCXCAA 

CACCrcCCAA CCACA.A.T. TCC.TaxCAT CAACA.T=.= SCC=3CA=CC T=3CAac:CC- 
OCAOrCCCT. CAC.-C.CCC AaCASCrOCA CA.CCCTCAC ™=CA=CCCC TCA.=CA=CA 



3180 

3240 

330C 

3360 

3420 

34B0 

3540 

3600 

3660 

3720 

3780 

3840 

39O0 

3960 

4 02 0 

4080 

414 0 

4200 

4260 

4320 

43B0 

4440 

4 500 

4560 
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GAGCCCAGGC AGCCACATGG CCCAGCAGCZ CTTCATGGCA GGTGTGACTC AGCTGCAGA;^. 4 6 20 

CTCACACAGT AAGGACACGG GCATGTGGAG GGAGGGAGCA CTCAGGACCC TCAGTGGCGA 4 6 80 

ACGACTTTCC CTCTCTGGGT CTGAACTTTC TCGGAAGTTT ATTGGCTTGG TCACTTTTG? 4 74 0 

CTGCCTATGA TCAACCGACT AAGACAATTT CTCAAGCATA ACTCTTGAGT GTTGCTGTAC 4 8 00 

CTTTTCTAGT CCTCTTCTCT ACCCCTGAGA TTCCCAGGGA AGGGTTTGAA TGACCTTTGC 4B6 0 

TCCCGTTCGG TACCGGAGGC GTCCCTGGTA GGAAATGTGT TCTGAGAGCA GGTGGTTTGT 4 92 0 

CCCrCACAGC CAAGCATCCA CATGCTTTCG GGAGTTGGTT ATGTGACTTG GAATTTACAT 4 9 80 

GT^TCTTATG GATAACTAAT ATGAGAAATC CCCACTATAA CCACCAGGCC TTTTATCTAC 5 04 0 

CTGAGGAGAT GGGAGCTATG GTGTGGGATG GGGGCTCTGT ACCTGTGTCT TTGCCTGTGT 5100 

ATGCACCTTG ATTCTGTCTT CACTCTGTGT CTCCAGTGTA CGCACACAAG CAGGAACCCC 516 0 

CCCAG7ATTC CCACACCTCC CGGTTTCCAT CTGCAATGGT GGTCAGAGAT ACGAGGAGCA 5i:0 

TCAGTACACT CACC.^iACATG TCTTCAAGTA AACAGGTAAT GCCAGCAGGA TATGCGGGGG 52 80 

TTGGGGTGTG GGCAGGGTGT GATAAGGCCA TGGATGTGCA AAGGTTGTGG CAAGCATGGA 534 0 

CTCGGCCAGA AATTATATCC TCTTTGCTGG TTGAGTTGGG CATCATCTCC CTTAGAGAAG 54 0 0 

CCAAACTAAT GGCCCATGAC CCTGCCAAAT GACACAGCTG AGCAGCCTCT CTCGTGTCTC 546 0 

TCTGCAGTGT CCTCTAGAAG CCTGGTGATG GCGACACACC AGTTACTTCG TGCGCAACAA 5 52 0 

GAAGGACCCT GTTTTCCACA CCATCACCCT CTGGGCAGCT GTCATGGAAA AGCGCAGTGA 5 58 0 

CCTGACCAGC ACCTGGGAGA GGTCCCTGCT ACCTGACGGA CGTCCTGCTG GCACCTCAGA 56 4 C 

CAATCCACTC TCAGGAGGCG GAGCCCGAAG CCCAGTTTCC CTTCTATGCA GTATTGCCAG 570 0 

AATGCCTCTC CCACGATGTC AAGGACTCCT GTCTGTCCTG GAGGTGGGAG ACAAGGAAGC 5 76 0 

ACCGAAGAGG AAGCAAGAAA GCCGTACTGT CTATGTTGTG ATCCTTCATC GAACAAACTG 5 82 0 

ATGCGAAAAC TTGAATCTGT TAGTGAAATG AGGAGAGAAG GACATGTGCT ATTGAACTGA 58 80 

GCCAAACAGA CTGTAAATAT CCACAGACTC CCTCCCCTGC CCCCATCCCA CATGATCTTG 5 94 0 

AGATTTCTTT TAAAGAAGTA AATTTGTCCA ATGGCTGTAA ACTATAAACT ACTGTAAT7A 6 0 00 

AGTGCAATTT CCCCTGTGTG TCCTCTCCCC TCTGGCCTGT ATATAATACT AAAGTGTCTA 6060 

TTAGTTTTGT TTGTAAAGGT CAGAGTCAAA ATTTCAAAAG TGATGTGTCC CCTCTCCCCT 6 120 

CATGGAGAAA CATCCTAAGT GGGAAGTGAA GCCCCTTGTC CTCTCCCGCG GGCCTGGACA 6160 

CTTATGGGGA CAGCATACCT TGGACTGACT ACCAGCTAAC TCCAGTCTCC TGACATTAAG 6 24 0 

ACACAGCTCT GGATCCCTGG AGGGGCTGAA TGTAGTGTGT CAGAGTAACA TGGGAGCTTC 6 3 00 
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CTGTGGGCrA GGAGCTCAGC CTGGACTGCC TAAGAAACCC CAGGGCAGGG AA^GTGGCTG 

TTTGATAGGA GAAGAAAAAG TTGCAGTCTC AAAAGCGTTr C;^ ~A-AAJ. "^A 

TCACTAAAAA AAA 

(2) INFORMATION FOR SEQ ID NO: 129: 

Cl) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 ammo acids 

(B) TYPE : ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 129; 

Met Val Ser Lys Leu Thr Ser Leu Gin Gin Glu Leu Leu Ser Ala Leu 
1 ^ 10 15 

Leu Ser Ser Gly Val Thr Lys Glu Val Leu Val Gin Ala Leu Glu Glu 
20 25 30 

Leu Leu Pro Ser Pro Asn Phe Gly Val Lys Leu Glu Thr Leu Pro Leu 
35 40 45 

Ser Pro Gly Ser Gly Ala Glu Pro Asu Thr Lvs Pro Val Phe His Thr 
50 55 60 

Leu Thr Asn Gly His Ala Lys Gly Arg Leu Ser Glv Asp Glu Gly Ser 

^0 75 80 

Glu Asp Gly Asp Asp Tyr Asp Thr Pro Pro lie Leu Lys Glu Leu Gin 
65 90 95 

Ala Leu Asn Thr Glu Glu Ala Ala Glu Gin Arg Ala Glu Val Asp Arg 
100 105 110 

Met Leu Ser Glu Asp Pro Trp Arg Ala Ala Lys Met He Lys Gly Tyr 
115 120 ^ 125 

Met Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Val Thr Gly 
1^0 135 140 

Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lvs Gly Thr Pro Me^ 

150 155 ' 160 

Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin 
165 170 175 

Arg Glu He Leu Arg Gin Phe Asn Gin Thr Val Gin Ser Ser Gly Asn 
180 185 190 

Met Thr Asp Lys Ser Ser Gin Asp Gin Leu Leu Phe Leu Phe Pro Glu 
195 200 205 
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Phe Ser Glr. Gin Ser His Gly Pro Gly Gin Ser Asp Asp Ala Cys Ser 

-1 1 c 2 2 0 

210 



Giu Pro Thr Asn l^ys 
225 



Lvs Met Arg Arg Asn Arg Phe Lys Trp Gly Pr 



230 



23 S 



24C 



Ala ser Gin Gin He l.eu Tyr Gin Ala Tyr Asp Arg Gin Lys Asn Pre 
245 250 

ser Lys Giu Giu Arg Glu Ala Leu Val Glu Glu Cys Asn Arg Ala Glu 
260 265 270 

cys Leu Gin Arg Gly Val Ser Pro Ser Lys Ala Hxs Gly Leu Gly Ser 



275 



280 



Asn L.eu 
290 



val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 



295 



.vs Glu Glu Ala Phe Arg Gin Lys Leu 



300 



Ala Met Asp Ala Tyr Ser Ser 



30 5 



310 



3 15 



320 



Asn Gin Thr His Ser Leu Asn Pro Leu Leu Ser His Gly Ser Pro Hi 



325 



330 



His Gin Pro Ser Ser 
340 



Ser Pro Pro Asn Lys Leu Ser Gly Gly Lys Gl: 



345 



350 



Arg 



Leu Gly Leu Thr Ala Ser Ala 



Thr Gin Pro Ser Trp Phe Leu Pro 



355 

Arg lie Leu Ser Gly Leu Arg Val Phe Arg Gly Ala Asn Ala Phe Glu 



360 



365 



370 



375 



380 



Met He Leu Gly Pro Leu Ser Hi. Cys Gin Asn He Leu Pro Trp Lys 
385 390 395 

,.,1 r„ cAn Gin Glv Asn Asn Glu He Thr Ser 

415 



Gly val Arg Tyr Ser Gin Gin Gly Asn Asn Glu He Thr Ser Ser Ser 
405 410 415 

Thr He ser His His Gly Asn Ser Ala Met Val Thr Ser Gin Ser Val 



420 



425 



430 



Leu 



Ser 



Gin Gin Val Ser Pro Ala Ser Leu 



Aso Pro Gly His Asn Leu Leu 



435 



440 



445 



Pro Asp Gly Lys Met He Se 



r Val Ser Gly Gly Gly Leu Pro Pro 



455 



460 



450 

Ser His His Asn Pro Gin 
475 480 



val Ser Thr Leu Thr Asn He His Ser Leu 



465 470 

Gin ser Gin Asn Leu He Met Thr Pro Leu Ser Gly Val Met Ala He 
485 



490 495 



Gin Ala Gin Ser Val Pro Val He Asn 

500 



Ala Gin Ser Leu Asn Thr Ser 

505 5-^° 
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2:^6 



Pne Ser Gin 



Va: Ala Giy Ser Leu a. a Ala Le-^ Glr. Pic V 

Gin Le-^ His Ser Pre His Gin Glr. P— Le- N--r - 

53 0 Ser Fro Glv 

** 

Ser His Met Ala Gin Gin Pro Phe Me- A^;. - ^ 

54 5 ^^-r G.n Leu Gin 

Asn Ser H.s Met Tyr A.a His Lys Gin Glu 

565 5-^^^ 



Pro Pro Gin Tyr Ser His 
575 



Thr ser Arg Pne Pro Ser Ala Met Val Val ... Asp Thr Ser Ser 

585 

ser Thr Leu Thr Asn Me. Ser Ser Ser Lys Gin Cys Pro 
Trp 



590 

5 95 •^'^^ ^eu Gin Ala 



(1) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10014 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 13 0: 
TGOGTTOCCT CTGACTCCAC TOGCOATACC CCCACAAAGC CCACTCTGAA CGTAGCAGAC 
03GTGGAGAG AAACAGGOGG ATGGCAAGGG GGATACGAAA CAGGGAGAGG GAGGAGGGGO 
AAGAGGATGG ACGTCTACCA GGCCCCACTT GGTGCTTGAT TTATGCCATC TCATTTCCTT 
CTCAAACCAC CCTTTGAAGT TGATTGTACA TTTTACAGAA AAGGAAACTG AGGCTCGGAG 
AGGAGAATCA TTTACCCAAG GTCCCAGTTA GTAGACGGTA GGTGCCTGAA TGTAAATCCA 
CCTCTCTGCC TGCTCCGCOA GGGGGTGGGG GTGAGGGAAA CAGGAGAATG TGATGGOAAA 
ATCCGAGATG GAGCCAGCCT GGGCCAGAAA CACTGGGAGC TGTGGGAGAC GGAGAGGGGC 
AGGGTGGGAT CACAGGGAGC AGGAGCGGGG AATTGGAGGT GAATCTGGCC CTCCCAAACT 
TCCAGTCCAT TCTGCTCCCA GGGGAACCGG GAAACTGCGG GGGAACTGGA AGGGAOCTCC 
CAGAACAAGO ATCCAGAAGA TTGGCATCTG GGGCCTOGGA TTTAGGTTTC TAAATCGTGG 
CCCATGGGGC AGCCTTATCT CTGCAAAAGC ATTGAGGGTA GAACTCAATG ATTTGGGAAG 
TTATTGAATT AGGGGATCTC GGAGGTAGGC TGTCAGTGCC TGATAGTATC AGTTACAATG 
CCTGACTTGG GGTGACAATG GCTTGGAGGG GTGGGTGAGT CAAGGGTCAA ATGAGTGCCC 



60 
120 
180 
240 
200 
360 
420 
480 
54 0 
600 
66 0 
720 
780 
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G\'GAGTCATG ATGCCTGCCT TGTACAATTG ATAACTGAAC ATlGGTGAGT TAGGGCCCCA 840 

GCAGTTG7AA TTAGCACCCC GGGTGTCAGC CAGAAACCAA CAAACAGCCA AATCCCTGCA 900 

GCCCCGCCCA GCCTATCCAC CGGCGGGGGA CCGATTAACC ATTAACCCCC ACCCCTCCCC 96 0 

GGCAGAGGCT CCACCCCTTC ACAGAGGCTA GGCCAAGACT CCCAGCAGAT CTTCCCAGAG 1020 

GACGGTTTGA AAGGAAGGCA GAGAGGGCAC TGGGAGGAGG CAGTGGGAGG GCGGkG:}GCG 10 80 

GGGGCCTTCG GGGTGGGCGC CCAGGGTAGG GCAGGTGGCC GCGGCGTGGA GGCAGGGAGA 114 0 

ATGCGACTCT CCAAAACCCT CGTCGACATG GACATGGCCG ACTACAGTGC TGCACTGGAC 12 00 

CCAGCCTACA CCACCCTGGA ATTTGAGAAT GTGCAGGTGT TGACGATGGG CAATGGTAGG 12 60 

TGGGGGCAGA TGTGCCCAGG TGTGCCAGTG GGGGCAGGTG TGCCTGGGTG CAGGAGCAGA 132 0 

TCTTTGGCAC TCAACTTTGG GGTGGGAGGA GAATGATACA AAATGGTAGG TTGGTCCTAC 13 80 

AGGCCAGCAC AGGTGTTGCC AAGTGAAGCC CATGTGCCCA GGCACAGTGA TCACAGGCAT 144 0 

TCTGGGTGAA GGGAGGCCTG CAAGGGCCAA TTTCCAGCAA AAGTCGATCC CGGCTATTCC 15 00 

TCCCAGGCCC TTCCAGTCCT CACTGCGTCA CAGTGGCTCT GCTTGGCGCT TGGCACAGTG 156 0 

ACATGATGGT GAGCTCCCCC TTGGTGCCCA GCTCCAGCGA TTCAGGCCAG CAGGGCCCCT 162 0 

TGGTGAACCC CTTGGGCC7A GGTTCAGAGA GACGGCAAGG GATGTTGTAT CCCTGGAGAT 16 80 

GGTGGTTGGA GACATAACGG CATTTCTCGG TGTCTTTGGG ACTTTCCTAG GGAAATGAAA 174 0 

TTGGCACTTA GGGAAAATGG AGCTGTCAGG GAAGTTTTGC TAACTACGAA GCCAACTCAG 18 00 

CACTGTGTGT GTTGTGTGTG CGTTCGTGTG TGATAGTGAG TTTCCATGTA GGTTGTATGG 1860 

GTGGGGTGAT GCCTTCAGGA ACCCATTTGC ATATGTGTGT TCATTTGTCT CTGTGTGTGA 192 0 

GTTCTGGGTC TATTTTCCTT TGTATTCATT GAGTGGGTCT GTGTTTGTGT CTTAGGAGTT 198 0 

GCCCGTGTTG ATCTTGCTTA TGTATGTAAG TGTGTATGTG TGTGTACTTG TGTCTGTGGA 2 04 0 

TGTTTGTACA TGTGTGCTGT GTGTGCGGGT CATAGAGCAC ATGCGTTTGT GCATGCGGAC 2100 

CTGTTGGAGT GCCCTGTTCT TCCTGCATCT TTATCCTGTA TGGGCGTTTT GTCGTGTGCC 216 0 

CATATTTGTA CCTGCTGTGT ATATATGCAG TTCCCTGTGC TGCGGGCGGG GGTCAGCGGT 2 220 

CTCTGGTGTG CACGACTGCA CAGACGCAAA TGCAGGACTC TGTTGTTGCC ACTCACCAAG 22 8 0 

TGAGATTCA7 ATCAGCAACA TGTCCGTTTG TCTCTGAGCA GATTTTGT7G CCGCTGCGTC 234 0 

TCGCCAGA77 GAGGCA7CCC CTCCGACATC ACTGGAGCAT ATCTGGAGGG GTGGACAGTT 2 4 00 

C7GCACAGGG AGG7AGGGGA AAAGAGGAGG CCCGGAAACC CC7CC7GGAG GGAAGAGCCC 24 6 0 



^^'^'"^^ PCr/L'S97/,6037 

238 

CATCGGTCCC AGGCCAGCCT CAGAG3AGAG GGGGCAGGGA GGTGGCTGA3 GTGAGrCTGT 

CACCCTGCTT CCTTGTGTGT CTTGGAGCCA CTCAGCCAGT ATGAGGCTGr AGGTGGAGCT 



GAGGTC7GGA ATGTTGT3GT CAGCTCAGGT AGGG7GAGGA GGGAGG: 



2 5S 



TGTTGTCA3C TCAGCAGGTG CTCACCTGCC CCTGCCGTCC AGTCACGTGT GArCTTGGGr 
ATGTCACCTC CCCTATCCTG GCTTCTGTAT CTTCTACAAA ACAGGCTTCA TTCCCCCAGG 
CCTGCTGGCT GGACGGCTTT TAGGCCTGTC TGAGGACCAC GCCAGGAGCG CAAGGCAAAA 
ACAGACCAGA GATCCCCTTG CGAGTTAGGA GGCCGGCTCC CACCCCAG.AA GGTGGCCAGG 
TTTTCATGCC TTCCTAGAGA AAGCTGGGGC TGGTGGCCTC CACCACAGGG AGACGCAGAC 
CCTCAGAAAC AAGTCTGTGA AGTCACAACC AGCCCCAGTT TACAGATGTG AAACTGAAGC 
TCCAAAAAGT CAGGAGGTCA CTGAGTGGGG AGGTGATGGA GTGGGAACAG CCCCCAGATC 
TGGCTGAGGC CGAAGCCCTG GAGAGATCCC CGCAAGGCTC CCTTAGATGG CTGACATTCT 
GCTCTTCCTG AAGCCTCACT CCCTTCTCTC CTGGCGCAGA CACGTCCCCA TCAGAAGGCA 
CCAACCTCAA CGCGCCCAAC AGCCTGGGTG TCAGCGCCCT GTGTGCCATC TGCGGGGACC 
GGGCCACGGG CAAACACTAG GGrGCCTCGA GCTGTGACGG CTGCAAGGGC TTCTTCCGGA 
GGAGCGTGCG GAAGAACCAC ATGTACTCCT GCAGGTGAGG AGGCTCAATT TCTTCAGCTG 
GGAAATGGGC ACACTTGGGC TCATGGGCGC AAGGTCTGTC TTCTGCCTGA GTGGGTAGGT 
CCCAGAGACA GCTGCCCTTG AGGGCCTTCA AGGCTCTTCT GGTTTTGTAA AAGACTTTGT 
GAATCCAAGA AGAGCATCTA * TTCTAGGAAC CACATTTAGT GATCATCAAG GTACTGGCTG 
CCGTTTATTG AGCTCTTATC ATATGCCAGG CACAATACTA AGTCTTTGTG TGTATTTACG 
TACTCCAGAG GTCAAGGTTC CCAACTCAGG TCTAACACCA ACGAGCAGAG GGACCCAGGA 
CCACATGTTG CCTCTCTGAG CCTCAGTTTT CCCATGTTTA GCAGGACAGG ACTGGGCTCT 
TAGAGAGTTC ATAGCACCTT TCCAGCTCCT GGTGGGTTCA AGAGAGAACT CCCGGGATGA 
AGAGATGAGA GCACTGAGGT TGGGGGGTCA ACTGGATAGC CAGGGCCCTA GTTCTGTCCT 
AAGAGGAGGA AGTTGTGTCT TCTCCATCCA ACCATCCAAA GCCCTCCCCA GATTTAGCCG 
GCAGTGCGTG GTGGACAAAG ACAAGAGGAA CCAGTGCCGC TACTGCAGGC TCAAGAAATG 
CTTCCGGGCT GGCATGAAGA AGGAAGGTGA GCCTCGCCCG TCGCCGCCCC ACCACCACTG 
CCCCACCTGC ACCCACAGCT CCCCGACAGT CATTTACAAC TGTAGCCACA CTT7ATGACT 
CAGTGGCAGG CCCCAGGGTG ACTGGCTAAT GGCTGAGAAG AGGGAGGGCC TGGAAATCTG 
ACCATAGGGA GCGGCTGGGC TTGGTCTTGA GAAAGATTCT CCCACTCGTC ATCAGTCACA 



: 7C0 
276D 
2820 

2 88 0 
2940 
3000 
3060 
3120 
3180 
3240 

3 300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 

3900 

3960 

4020 

4OB0 

4140 

4200 
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GACACCCCCA CCCCCTACTC CATCCCTGTT CTCCCTCCTC ACCTCTCTGT GCCTCCTCAC 426 0 

CCGTCCAGAA TGAGCGGGAC CGGATCAGCA CTCGAAGGTC AAGCTATGAG GACAGGAGCC 4 32 0 

TGCCCTCCAT CAATGCGCTC CTGCAGGCGG AGGTCCTGTC CCGACAGGTA CCGGGGTGAT 4380 

CCTGCCACCC ACCCAGGGAT CCCCCACACT ACAGAGGAGC TCACCTCGTC CACCTCCATT 4440 

CTCCCCAGCC AGGCCCTGGA GCAGCTGACG GGAGGGGCCT CAGATATTAC AGAAGGGACA 4 50 0 

CTGAGTGCGG TTTCACATGG CCCAGTTTGC AGCAAGGGCA GGAATCGAAC CTGGCGCCCT 4 56 0 

GGGGCACTTT CTAATTCATC CTACTGCCTG CATCCCACAG GCCAAGCAGA GTCTTCACCT 46 20 

TCACTGAGGG CCTGCGATCA GCTCAGCTCC GAGAGAACAG AGCAGTGGCT CAGTGGAGAG 46 8 0 

AGGTGGCAAA GTGGGGCCCA GCCCTTCCCT TGCTGA.GTGA CCTTGGGCAA GTGACAGCAC 4 74 0 

CTCTCTGAGC CATGGTTGCC TCATTGTCAG AAAAGGATGA TGATTTTTTG CCCTGCTTCT 4 8 00 

CCTCTAAGGC TGACAGACTC CTTGGGGCTC TAAAGCTGTT CTCCCTCATC CCTGCCTCCT 4860 

CCCTCCCTCC GTTTTTACCC TGAGCTTCCT TCAGAGCTGG AGGGCACCCA CTATCCAGCC 4 92 0 

CCCTCCCCAC ATCTGATTCC AGGGAGGGGG CTCTGTGCAG GGGACAGAGA ATQCGGGAGG 4 98 0 

GCCCGGACAT CTGCAGCATT TTCTTCCCTG TATCTCTCGA AGATCACCTC CCCCGTGTCC 5 04 0 

GGGATCAACG GCGACATTCG GGCGAAGAAG ATTGCCAGCA TCGCAGATGT GTGTGAGTCC 5100 

ATGAAGGAGC AGGTGCTGGT TCTCGTTGAG TGGGCCAAGT ACATCCCAGC TTTCTGCGAG 516 0 

CTCCCCCTGG ACGACCAGGT GAGGATGGGC GTGGATGGTG GGCAGTAGTG GGCAGTGGGC 52 2 0 

GGGGCAGCCA GGGGGCTGCT GGCCCACCTG GGATATAGCC GTGGACTGGC TTGATTTTAT 5280 

TTTATTTAAC AAAATATGTA GTGCACACAC GTGTCTGAAA CT7TAAATCA CCTTACAAAT 53 4 0 

ATTAACTCAG TTAGCTCCTC CAACAACTCT ATGAGGTAGG TACTAAGGTA CTATTATTAC 54 00 

TGCCATCTCA TAGGTGAGGA GATTGGGGCA CAGAGAGGTT AAGTAACCTG CTCAAGGTCA 54 6 0 

CATAGCTACT ATCCAGCATA GCTGGGATTT TTACAAAGCA CCCTTCATAA TTCTCCATAG 5 52 0 

CTGGTCCATG GGTGGGAATT TGGGACCCAC AGTTTTGGAA CTTTTTGGGA TCATAGACCT 5 5 80 

TTTTGAGAAT CTCAAAAAAG AAAAAAAAAG CACACAGAAT GTTGCTTACA GTTTGATCAG 564 0 

GCACACAGAA GAGGCCCAGC ACGAAGCAGT TTCTTGCCCA AGGACACAGC AGTTCAAGGA 5 7 00 

CAGAGTCAGC GCGAGGTCTC TCAGCTCTGA GCACATGTTC TTTCCCCTTC CAGGTTTCTA 57 60 

GTTTTATGGG TAGTAGTTTT ATGATGCCCA TTTCACAGTT CAGGCAGGTA GAGGCAGAGG 5820 

GGAGCATTAA GCTGACTTGC CCAGCGTCAC TGAGTTGGCT ACGGGCAGCC TTCCCAAGGG 5 8 80 
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TACAGATGGC AAACACTGTT CCTTCTCTCT TTCA3GTG3C CCTGCTGAGA 3GCGATGGTG 5 94 0 

GGGAGGAGGT GGTGCTCGGA GCCACGAAGA GATCCATGGT GTTGAAGGAC" GTGCTGCTCC 6Z0C 

TAGGTGAGGC GGCTGCCTGC CCTGGGCAGG GGTCCAGGGA GGGTATGCCT AGCATGGCAC 6 06 0 

TCACCCAGGC AAGGAGATTC ACATGGTGGC ATGCAAGGGT GAGGGAGACT AGTCAGGAGT 6 12C 

GGCGCTGTCC TCAGGCTTGC ATTGGAGGGC TCCAGGAGTC AGTTTTCAAG rGGGTACCCZ 618 0 

ACTGAGATGC AAGGAAATGT GGATGCAAGT CACCAAATTC CCAGCATTGA AGTCAGAGCA 6 24 0 

CGATCAGGGT TATCCCTGGA ATTACCTGTG CATCCTTTTT TCTTTTGACA GAGTCTTGCT 6 3 00 

CTGTCACTCA GGCTGGAGTG CAATGATGTG AGCAAACACT ACCTATTTTA ATATAACAAT 6 36 0 

GCTATGAGGG AGCTCGATTA 7TTATCCTCA TCTTATAGAT AAGAAAACTG AGGCACAGAG 6 42 0 

AGGTTAAGTA ACTTATCCAA CTATAACCAG CTATCAGGGG CAGAGCCATT TAAGCAGGGC 648 0 

AGTGCAGTTC GAGAATCTGG TCCTTTAACC TTGATGCTTT GG7GCCTA7C AGGTGACCTT 6 54 0 

TGAATGTCAT CGATCTTGTG AGTCATGTTG GTAAATGGAG CTTGGGTCAT GTGAAAGAGG 66 0 0 

TCCTAGAAAG CCAAGTTCCA AGCTCAGCCG GATGACTCAA GGCAGCTrAT CTTCTGAATC 666 0 

TGGGCCTCAG CTTCCTTACC TGTGAAATGG GAGTCACCAT CCCTGCAGGT CCTCGTCCCA 6 72 0 

CAGGCACCAG CTATCTTGCC AACTTAAAAG CCAAAACTAG AGGAGAGGGG TCAAGCCAAG 6 78 0 

GTGACTTCGC ATCCTCCCTC CCTCCCAACC CTTCCAGGCA ATGACTACAT ^GTCCCTCGG 6 84 0 

CACTGCCCGG AGCTGGCGGA GATGAGCCGG G7GTCCATAC GCA7CCT7GA CGAGC7GG7G 6 90 0 

C7GCCC77CC AGGAGC7GCA GATCGATGAC AA7GAGTA7G CC7ACCTCAA AGCCA7CATC 6 96 0 

T7C777GACC CAGGTACAGT GCACACC7CC 7AAGCCATCC C7GAC7C7C7 C7CCAGAACG 702 0 

CTC7GCCAGA C77C7CCTAT 7GGG7TC7G7 ACAC7GAG7T CACAGCCTCA 7C7CATG77A 70 6 0 

ACGACAGCCA GGAGAGGCCG TTTTCATTTA ACAGATGAGG CAAGTCAAGA 7T7GAAGAGA 714 0 

CAA7A7GGCC GGGCGCAGTG GCTCACACC7 G7AA7CCCA7 CAC7TTGGGA GGCTGAGGCG 72 0 0 

GGCGGATCAC C7GAGG7CAG GGG7CAAGAT GAGCCTGGCT AACATGGAGA ^^CCCCATC7 726 0 

C7AC77AAAA G7GGC7CTGC CAACAAC7GG C7G7GCGACC CAGGACAAG7 CC7A7CTTTG 732 0 

CAC7G7G7C7 GGG77TCCCC GTG7G7AAGA 7GAGGCGG7T GC7AGGTGC7 TA77GGA7GC 73 80 

A77CCTCAAG 7CCCGCCCTC CATC7CCTA7 7CCCC7CTCT TC7GG7T7AG 7GC77TAGGA 74 4 0 

AA7G7GGCAG AAA7CTTTTT CTGCCTG7G7 C7AGGAAA7C A7AA7TCATG CTGGCG7ACC 75 0 0 

C7GG77G773 AGG7CGCTGA A7CCT7GTGC CCACAC7GCT GAAGACTCC7 7G7G7GACAC 7 56 0 

AAG7CAGGGG AGA7C7GGG7 CT7GAC7C7C GAGA7GC7CC AGC7GGACCC 7GC7GCCC7C 76 2 0 
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CCiTGCCCAC CCTCTTCCAT TGTAGATGCC AAGGGGCTGA GCGATCCAGG GAAGATCAAG 76 8 0 

CGGCTGCGTT CCCAGGTGCA GGTGAGCTTG GAGGACTACA TCAACGAr CG CCAGTATGAC 774 0 

TCGCGTGGCC GCTTTGGAGA GCTGCTGCTG CTGCTGCCCA CCTTGCAGAG CATCACCTGG 7 8 00 

CAGATGATCG AGCAGATCCA GTTCATCAAG CTCTTCGGCA TGGCCAAGAT TGACAACCTG 786 0 

TTGCAGGAGA TGCTGCTGGG AGGTCCGTGC CAAGCCCAGG AGGGGCGGGG TTGGAGTGGG 7 92 0 

GACTCCCCAG GAGACAGGCC TCACACAGTG AGCTCACCCC TCAGCTCCTT GGCTTCCCCA 798 0 

CTGTGCCGCT TTGGGCAAGT TGCTTAACCT GTCTGTGCCT CAGTTTCCTC ACCAGAAAAA 8 04 0 

TGGGAACAAG GCAATGGTCT ATTTGTTCAG GCACCGAGAA CCTAGCACGT GCCAGTCACT 8100 

GTTCTAAGTG CTGGCAATTC AGCAAAGAAC AAGATCTTTG CCCTCGGGGA GGCTGTGTGT 816 0 

GTGTGAGTAT GTATGGATGC GTGGATATCT GTGTATATGC CCGTATGTGC GTGCATGTQT 82 2 0 

ATATAAAGCC TCACATTTTA TGATTTTGAA ATAAACAGGT AATATGAGGG ACACATAGAT 82 8 0 

GCTATAAGTA GGTCAGTTGG CTGCAGCAGA GATGTGGGGG ATGAGGCTGA AAGGTGAGGC 634 0 

GGGACCAAAT GGTTGAAGGA CTTGCACTCC AAGGAGCTTT GAGAGCCATT GATTACATCC 84 0 0 

ATTATGTTAC TATGTGACCA ATACATTACT CATTAGAACA TTTACGTGAT CTCAGAGCTT 84 6 0 

CCTTATATGC ACCTTGTTCC TTTCAACTCA CTTTTGTTCT CTTGGTTTTT TGGGGTCCTC 8 52 0 

TTAACACCCT CATGAAGTCT ATAGATGGGA ATGGTACACC CTAGTTTACT AACCCAGGAA 8 58 0 
TAGGTACCGA ACAGGCACTG CCAATATTGG ATGGGCTGGT TGATTGGCCA CGCCTGAGGA . 86 4 0 

AGATGGCGTC CCAAGGCCTG AGGTCTGCAT CCCAGACTCT CCATCCTGAT CGACCTTCTC 8700 

TACCTGCAGG GTCCCCCAGC GATGCACCCC ATGCCCACCA CCCCCTGCAC CCTCACCTGA 8 76 0 

TGCAGGAACA TATGGGAACC AACGTCATCG TTGCCAACAC AATGCCCACT CACCTCAGCA 882 0 

ACGGACAGAT GTGTGAGTGG CCCCGACCCA GGGGACAGGC AGGTGGGCAA ACTCTGGGAT 88 8 0 

TTTACCTTGC AAAGGGTGAG GATGGGGCTT AAGACAGGAG GCAGGAGAAA GTGGAGTCTA 6 94 0 

GAAGGTAGAA CCAGGATGCA ACAGTTTTCT GGGTTCCAGG GTAGGGAATA AAGGGCAAGA 90 00 

TTGTCCATTT GTTGAGGCTG TTTATTCAGT AAGGTGACTG ACAGCCTTTA CTGAATGAAG 9 06 0 

CCATTGTTGG GATGAGGCAA TCCACTGGAT GAGGTAACCC ATTGGGTGAA GATGTCTTGG 912 0 

GTGAGAATTC CATTAGTTGA CATTGTCCAT TAAGTAAAAG TGGTCATTGA AGTAAGGCTG 918 0 

CACAGTTGGG TAAGGCTATC CATTAGACAT TAGATGAGAC TACCCATTGG GTCAGGATGT 92 4 0 

CTGCTGGGCT ATTTGGGAGA AGCAGTCCAA GTCTGCATAT CAAATAAATG ATGGAGGAGA 93 0 0 
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TGGGTGGTAG GACCTTCGAG ACCTCATAAA ACTTAGGCTT TATGATCTGG GACTGACAGA 93 5 C 

AGGT7GAGCA ATAAAAGAGC TTAGGGATTA TCTGGCTTAA TTAATTCTCT GATTTTATAG 

AGGAAGAAAT TAAGTCAAGG TGGGGCAGGG TGGGAGGGGA GAACTTTGCG GGGGCTGTTG 

ArrrA'CTCGC acaaaggctg gaattttgag cagggcgtgt ctgtctgttt gtccttcczz 

ACCCCTGAGA CCCZACAGCC CTCACCGCCA GGTGGCTCAG GGTCTGAGCC CTATAAGCTG 

ctgccgggag ccgtcgccac aatggtcaag cccctctctg ccatccccca gcggaccatc 

ACCAAGCAGG AAGTTATCTA GCAAGCCGCT GGGGCTTGGG GGCTCCACTG GCTCCCCCCA 

gccccctaag agagcacctg gtgatcacgt ggtcacggca aaggaagacg tgatgccagg 
ACCAGTCCCA gagcaggaat gggaaggatg aagggcccga gaacatggcc taaggcacat 
cccactgcac cctgaggcgc tgctctgata acaagacttt gacttgggga gaccgtctac 
tgccttggac aactttctca tgttgaagcc actgccttca ccttcacctt catccatgtc 

CAACCCCCGA CTTCATGCCA AAGGACAGCC GCCTGGAGAT GACTTGAGCC TTAC 



(2) INFORMATION FOR SEQ ID NO : 131: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 567 amino acids 

(B) TYPE: ammo acid 
iC) STRANDEDNESS : 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 

Met Arg Leu Ser Lys Thr Leu Val Asp Met Asp Met Ala Asp Tyr Ser 
^5 10 15 

Ala Ala Leu Asp Pro Ala Tyr Thr Thr Leu Glu Phe Glu Asn Val Gin 
20 25 30 

Val Leu Thr Met Gly Asn Gly Pro Ser Ser Pro His Cys Leu Thr Val 
35 40 45 

Ala Leu Leu Gly Ala Trp His Ser Asp Met Met He Leu Leu Pro Leu 
50 55 60 

Arg Leu Ala Arg Leu Arg His Pro Leu Arg His His Trp Ser He Ser 

70 75 80 

Gly Gly Val Asp Ser Ser Pro Gin Gly Asp Thr Ser Pro Ser Glu Gly 
85 90 95 

Thr Asn Leu Asn Ala Pro Asn Ser Leu Gly Val Ser Ala Leu Cys A^a 
100 lOS 110 



9420 
94 8 C 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
.0014 



He Cys Gly Asp Arg Ala Thr Gly Lys His Tvr Gl 



y Ala Ser Ser Cvs 
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115 

Asp Gly Cys Lys Gly 
130 

Tyr Ser Cys Arg Phe 
14S 

Asn Gin Cys Arg Tyr 
165 

Lys Lys Glu Ala Val 
180 

Ser Ser Tyr Glu Asp 
195 

Ala Glu Val Leu Ser 
210 

Gly Asp He Arg Ala 
22 5 

Ser Met Lys Glu Gin 
245 

Pro Ala Phe Cys Glu 

260 

Ala Kis Ala Gly Glu 
275 

Val Phe Lys Asp Val 
290 

His Cys Pro Glu Leu 

3 05 

Asp Glu Leu Val Leu 
325 

Tyr Ala Tyr Leu Lys 
340 

Leu Ser Asp Pro Gly 
355 

Ser Leu Glu Asp Tyr 
370 

Phe Gly Glu Leu Leu 
385 

Gin Met He Glu Gin 
405 

He Asp Asn Leu Leu 



120 

Phe Phe Arg Arg Ser Val 
135 

Ser Arg Gin Cys Val Val 
150 155 

Cys Arg Leu Lys Lys Cys 
170 

Gin Asn Glu Arg Asp Arg 
185 

Ser Ser Leu Phe Ser He 

200 

Arg Gin He Thr Ser Pro 
215 

Lys Lys He Ala Ser He 
230 235 

Leu Leu Val Leu Val Glu 
250 

Leu Pro Leu Asp Asp Gin 
265 

His Leu Leu Leu Gly Ala 
280 

Leu Leu Leu Gly Asn Asp 
295 

Ala Glu Met Ser Arg Val 
310 315 

Pro Phe Gin Glu Leu Gin 
330 

Ala He He Phe Phe Asp 
345 

Lys He Lys Arg Leu Arg 
360 

He Asn Asp Arg Gin Tyr 
375 

Leu Leu Leu Pro Thr Leu 
390 395 

He Gin Phe He Lys Leu 
410 

Gin Glu Met Leu Leu Gly 



125 

Arg Lys Asn His Met 
140 

Asp Lys Asp Lys Arg 

160 

Phe Arg Ala Gly Met 
175 

He Ser Thr Arg Arg 

190 

Asn Ala Leu Leu Gin 
205 

Val Ser Gly He Asn 
220 

Ala Asp Val Cys Glu 
240 

Trp Ala Lys Tyr He 
255 

Val Ala Leu Leu Arg 
270 

Thr Lys Arg Ser Met 
285 

Tyr He Val Pro Arg 
300 

Ser He Arg He Leu 

320 

He A " '^ Asp Asn Glu 
335 

Pro Asp Ala Lys Gly 
350 

Ser Gin Val Gin Val 
365 

Asp Ser Arg Gly Arg 
380 

Glu Ser He Thr Trp 

400 

Phe Gly Met Ala Lys 
415 

Gly Gly Pro Cys Gin 
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*i20 42S 430 

A* a Gin Glu Gly Arg Gly Trp Ser Giy Asp Ser Pre Gl y Asp Arg Pro 
435 440 445 

His Tnr Val Ser Ser Pro Leu Ser Ser Leu Ala Ser Pro Leu Cys Arg 
*i50 455 460 

Phe Gly Gin Val Ala Gly Ser Pro Ser Asp Ala Pro His Ala His His 
465 470 475 480 

Pro Leu His Pro His Leu Met Gin Glu His Met Gly T^.r Asn Val lie 
485 490 49S 

Val Ala Asn Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu 
500 505 510 

Trp Pro Arg Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro 
515 520 525 

Ser Pro Pro Gly Gly Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly 
530 535 

Ala Val Ala Thr He Val Lys Pro Leu Ser Ala He Pro Gin Pro Thr 

550 555 560 

He Thr Lys Gin Glu Val He 
565 



(2) INFOPJ^TION FOR SEQ ID NO: 132: 

(i) SEQL^NCE CHJ^RACTERISTICS : 

(A) LENGTH: 470 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 

AAGTAAGCCT TGTTTTTCCA CACTCATTCT CCCAGGTTTT CTTTGGATAG GCTTACTTTT 6 0 

CCATGCTGGA GGAGGGGCTA TCCCTTCATT TTGCCTCTCC CGCTTCCCTC CCTCTCCCCC 12 0 

TCCCCCTGCT TTCTCTCCCT CTGCACTTTG TGAACTGCTG CTGCAGTGCT GAAGTCCAAA 180 

GTTCAGTAAC TTGCTAAGCA CACAGATAAA TATGAACCTT GGAGAATTTA CCAATGTAAA 24 0 

CAGATAGCCA AGGGTCCCTT TATCAGCACT GGCTCAGGAC AGTCGTGGGG GGTCTGAAGT 3 00 

GGCTCAATTT TGTATTTTGT TTTTTTTGGG GGGGTGTAAA GGCG3ZA(^GZ TGCGCTGTGC 36 0 

CCGCTGCTGA CAGTCGGGCG TGTTACCTCG GGAACATGGT GTAGGGAAGC TGGAAGCAGG 42 0 

ATAACGTGGA ACTCAACCCA AGAAACGCCA GCCTGAAGAC CATGGTCTCG 4 70 



BNJSDOCID <WO 



wo 98/11254 PCT/US97/16037 

245 

(2) INFORMATION FOR SEQ ID NO: 133: 

(i) SEQUENCE CHARACTERISTICS: 

tA; LENGTH: 467 base pairs 
(B; TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

{XI ) SEQUENCE DESCRIPTION: SEQ ID NO : 133: 

TCACAGCTAT TAGCTCATCG CTGCCAAATT GCCCCTTTAC CTAGGCTTGT GTCACTTTCA 6 0 

CCTTCTCATT CTCTTACTTT TACATTCTTC CTTGATATTT TGCTTTTTCA ACTTTTGGAA 12 0 

ATTTCTTTCT CTCTTCTACC CCTCCTCATA TTCCTCTGCA CTCCCCCCTC TCTAACTCAT 18 0 

GCACTTTGTG GGGTCCAAAG TTCAGTAACT TGCAAAGCAC AGGGATAAAG ATGAACCTTG 24 0 

GAAGATTTAC TCTGCTCTGA TGTAAACAGA GAGTGACAAG GGTCCCTTAT CTATGTCTCA 3 00 

GAGAAGCCTG TCCGGGGGGT GACCACTTGC TGGTTGTGGC TGCACAGTGT GTTTTTTTGG 36 0 

GGGGGAGGAG GAAACAGAAG GTGGGTAGAG CATGGACTCC CGCCCGCTGA TCCGTGTTAC 42 0 

AGCCGCAGAT GGTGAGGCAG TAGAAGGCAA CAGACAGGAT GGCGTCT 46 7 

(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 479 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 134: 

TTTCGGGGGT GGGACCCAAC GCTGCTCTCC TGATGGCCTC CCTGGCTCCC AGCACCTTCC 6 0 

ATCCCAGCTG CTCAGGGCCC CTCACCTGCG CCTCCCCCAC CCTCCCCTCT GCCCACTCCC 12 0 

ATCGCAGGCC ATAGCTCCCT GTCCCTCTCC GCTGCCATGA GGCCTGCACT TTGCAGGGCT 18 0 

GAAGTCCAAA GTTCAGTCCC TTCGCTAAGC ACACGGATAA ATATGAACCT TGGAGAATTT 24 0 

CCCCAGCTCC AATGTAAACA GAACAGGCAG GGGCCCTGAT TCACGGGCCG CTGGGGCCAG 300 

GGTTGGGGGT TGGGGGTGCC CACAGGGCTT GGCTAGTGGG GTTTTGGGGG GGCAGTGGGT 36 0 

GCAAGGAGTT TGGTTTGTGT CTGCCGGCCG GCAGGCAAAC GCAACCCACG CGGTGGGGGA 42 0 

GGCGGCTAGC GTGGTGGACC CGGGCCGCGT GGCCCTGTGG CAGCCG^iGCC ATGGTTTCT 47 9 

(2) INFORMATION FOR SEQ ID NO : 13 5: 
(i) SEQUENCE CHARACTERISTICS: 
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(A, LENGTH: 60S base pairs 
(B! TYPE; nucleic acid 
;C! STRAf^TDEDNESS : single 
(D; TOPCLCGY: Imear 

1X1) SEQUENCE DESCRIPTION: SE3 ID NO. 13 5: 

TGGG3CCTG3 GATTTAGGTT TCTAAATCGT GG3CCATGGG GCAGCCTTAT CTCTGCAAAA 6 0 

GCATTGAGG3 TAGAAGTCAA TGATTTGGGA AGTTATTGAA TTAGGGGATC TCGGAGGTAG 120 

GCTGTCAGTG CCTGATAGTA TCAGTTAGAA TGCCTGACTT GGGGTGACAA TGGCTTGGAG 180 

GGGTGGGTGA GTCAAGGGTC AAATGAGTGC CCGTGAGTCA TGATGCCTGC CTTGTACAAT 24 0 

TGATAACTGA ACATCGGTGA GTTAGGGCCC CAGCAGTTGT AATTAGCACC CCGGGTGTCA 3 00 

GCCAGAAACC AACAAACAGC CAAATCCCTG CAGCCCCGCC CAGCCTATCC ACCGGCGGGG 360 

GACCGATTAA CCATTAACCC CCACCCCTCC CCGGCAGAGC CTCCACCCCT TCACAGAGGC 420 

TAGGCCAA3A CTCCCAGCAG ATCTTCCCAG AGGACGGTTT GAAAGGAAGG CAGAGAGGGC 4 60 
ACTGGGAGGA GGCAGTGGGA GGGCGGAGGG CGGGGGCCTT CGGGGTGGGC GCCCAGGGTA 
GG3CAGGTGG CCGCGGCGTG GAGGCAGGGA GAATGCGACT CTCCAAAACC CTCGTCGACG 
ACATG 



540 

600 
605 



(2) INFORMATION FOR SEQ ID NO: 136: 

(l) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136; 

TCCTGGAGAG TGGGACCCAG CGCCGCACCC AGAGGCCTCC TGGCTCCTGC TGCCTCTAGC 6 0 

CCTGCGCCCC TGGCCCCTCT CCACCTCCCC CACCCTCCCT TCTGCTCACT CCCAATTGCA 12 0 

GGCCATGACT CCGGTCCGC3 TCCCTCTCAC CCCCATGAGG CCTGCACTTG CAAGGCTGAA 180 

GTCCAAAGTT CAGTCCCTTC GCTAAGCGCA CGGATAAATA TGAACCTTGG AGAATTTCCC 24 0 

CAGCTCCAAT GTAAACAGAG CAGGCAGGGG CCCTGATTCA CTGGCCGCTG GGGCCAGGGT 3 00 

TGGGGGCTGG GGGTGCCCAC AGAGCTTGAC TAGTGGGATT TGGGGGGGCA GTGGGTGCA3 360 

CGAGCCCGGT CCGTTGACTG CCAGCCTGCC GGCAGGTAGA CACCGGCCGT GGGTGGGGGA 420 

GGCGGCTAGC TCAGTG3CCT TGGGCCGCGT GGCTGGTGGC AGCGGAGCCA TGGTTTCT 4 78 



BMSDOCID <W0 9eil254*1_L> 



wo 98/1 1254 PCT/US97/16037 

247 

(2) INFORMATION FOR SEQ ID NO : 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 622 base pairs 
CB) TYPE : nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY : linear 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 137; 

TGGGCTTGGG TGTTAGGTTT CCAGTTCAAG CGACCCAGGA CAGCTTTATC TCAAATTGAG 6 0 

GATAGAAGTC AATGATCTGG GACGTGATTG GCTTAGGGCT TCATAGTGGT AGGCTTGCCA 12 0 

GTGTCTAAAC ATGTCAGCTG GGTTGTCCAC CTTGGTGAGA CTTGGGGGCT GCTGAGGCAA 18 0 

GGGGTCCAAC CAATGCCAGT CCTGTTGGGT GCCTGCCTTG GAAGATTGGT AAGTGACTAT 24 0 

TAATGAGCGG GAGGTGGGGG GGGGGCAACA GTTGTAATTA GCACCCCAGG TGTCAGTCAG 3 00 

AAACCAACAA ACAGCCAAAT CCTCGTGGCT CCKCCCAGCC TACCCAGCAA CGGGGGTGAT 36 0 

TAACCATTAA CTCCTACCCC TCCCCACAGA GCCTCCACCC TCTGCAGAGG CTAGGCCAGG 42 0 

ACGCCAGGCT GAGTCTCCCA GAGGACAGTT TGAAAGAGAG GAAGGCAGAG AAGGGACCTG 48 0 

GGAGGAGGCA GGAGGAGGGC GGGGACGGGG GGGGCTGGGG CTCAGCCCAG GGGCTTGGGT 54 0 

GGCATCCTGG GCCGGGCAGG ACAGGGGGCT AAGGCGTGGG TAGGGGAGAA TGCGACTCTC 6 00 

TAAAACCCTT GCCGGCGATA TG 522 

(2) INFORMATION FOR SEQ ID NO: 138 :v 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 470 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 138: 

TCTTGGGCAG TGGGACCAGC GCTGCTCCCA GAGGCCTCCT GGCTCCTGGT GCCTCTCTCC 6 0 

CTGCGCCCCT GGTTCCCGCT CCACCTCCCC CACCCGCCCT TCTGCTCACT CCCAATTGCA 12 0 

AGCCATGGCT CCCGGTCCGG TCCCTCTCGC TGCTGTGAGG CCTGCACTTG CAAGGCTGAA 18 0 

GTCCAAAGTT CAGTCCCTTC GCTAAGCACA CGGATAAATA TGAACCTTGG AGAATTTCCC 24 0 

CAGCTCCAAT GTAAACAGAG CAGCAGGGGG CCCTGATTCA CTAGCCGCTG GGGCCAGGGT 3 00 

TGGGGGTTGG GGGTGCCCAC AGGGCTTGAC TAGTGGGATT TGGGGGAGCA GTGGGTGCAG 36 0 

CGAGCCTGGT CCGTTGACTG CCAGCAGTAG ACACCGGCCG TGTGTGGGGG AGGCGGCTAG 420 
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IN." opj'LATio:; FCR se; :d no: 139 

(1/ SEQUENCE CHAJ^CTERISTICS : 

(A) LENGTH; 557 ammo acids 
(B; TYPE: arr.mo acid 
(C; STRAl^rDEDNESS : 
(D) TOPOLOGY: linear 

(XI SEQUENCE DESCRIPTION: SEQ ID NO: 139: 

Met Val Ser Lys Leu Thr Ser Leu Gin Gin Glu Leu Leu Ser Ala Leu 
^5 10 15 

Leu Ser Ser Gly Val Thr Lys Glu Val Leu Val Gin Ala Leu Glu Glu 
20 25 30 

Leu Leu Pro Ser Pro Asn Phe Gly Val Lys Leu Glu Thr Leu Pro Leu 
35 40 45 

Ser Pro Gly Ser Gly Ala Glu Pro Asp Thr Lys Pro Val Phe His Thr 
50 55 60 

Leu Thr Asn Gly His Ala Lys Gly Arg Leu Ser Gly Asp Glu Gly Ser 
^5 70 75 80 

Glu Asp Gly Asp Asp Tyr Asp Thr Pro Pro He Leu Lys G^u Leu Gin 
65 90 95 

Ala Leu Asn Thr Glu Glu Ala Ala Glu Gin Arg Ala Glu Val Asp Arg 
100 105 ^ 110 

Meo Leu Ser Glu Asp Pro Trp Arg Ala Ala Lys Met He Lys Gly Tyr 

120 125 

Met Gin Gin His Asn He Pro Gin Arg Glu Val Val Asp Val Thr Gly 
130 135 140 

Leu Asn Gin Ser His Leu Ser Gin His Leu Asn Lys Gly Thr Pro Met 

150 155 160 

Lys Thr Gin Lys Arg Ala Ala Leu Tyr Thr Trp Tyr Val Arg Lys Gin 
165 170 175 

Arg Glu He Leu Arg Gin Phe Asn Gin Tnr Val Gin Ser Ser Gly Asn 
180 185 190 

Met Thr Asp Lys Ser Ser Gin Asp Gin Leu Leu Phe Leu Phe Pro Glu 
195 200 205 

Phe Ser Gin Gin Ser His Gly Pro Gly Gin Ser Asp Asp Ala Cys Ser 
210 215 220 

Glu Pro Thr Asn Lys Lys Met Arg Arg Asn Arg Phe Lys Trp Gly Pro 
225 230 235 ' 24C 
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Ala Ser Gin Gin He Leu Tyr Gin Aid Tyr Asp Arg Gin Lys Asn Pro 
24S 250 25 S 

Ser Lys Glu Glu Arg Glu Ala Leu Val Glu Glu Cys Asn Arg Ala Glu 
26C 265 270 

Cys Leu Gin Arg Gly Val Ser Pro Ser Lys Ala Hxs Gly Leu Gly Ser 
275 280 285 

Asn Leu Val Thr Glu Val Arg Val Tyr Asn Trp Phe Ala Asn Arg Arg 
290 295 300 

Lys Glu Glu Ala Phe Arg Gin Lys Leu Ala Met Asp Ala Tyr Ser Ser 
305 310 315 320 

Asn Gin Thr His Ser Leu Asn Pro Leu Leu Ser His Gly Ser Pro His 
325 330 335 

His Gin Pro Ser Ser Ser Pro Pro Asn Lys Leu Ser Gly Val Arc Tyr 

340 345 35 0 

Ser Gin Gin Gly Asn Asn Glu He Thr Ser Ser Ser Thr lie Ser His 
355 360 365 

His Gly Asn Ser Ala Met Val Thr Ser Gin Ser Val Leu Gin Gin Val 
370 375 380 

Ser Pro Ala Ser Leu Asp Pro Gly His Asn Leu Leu Ser Pro Asp Gly 
385 390 395 400 

Lys Met He Ser Val Ser Gly Gly Gly Leu Pro Pro Val Ser Thr Leu 
405 410 415 

Thr Asn He His Ser Leu Ser His His Asn Pro Gin Gin Ser Gin Asn 
420 425 430 

Leu He Met Thr Pro Leu Ser Gly Val Met Ala He Ala Gin Ser Leu 
435 440 445 

Asn Thr Ser Gin Ala Gin Ser Val Pro Val He Asn Ser Val Ala Gly 
450 455 460 

Ser Leu Ala Ala Leu Gin Pro Val Gin Phe Ser Gin Gin Leu His Ser 
465 470 475 480 

Pro His Gin Gin Pro Leu Met Gin Gin Ser Pro Gly Ser His Met Ala 
485 490 495 

Gin Gin Pro Phe Met Ala Ala Val Thr Gin Leu Gin Asn Ser His Met 

500 505 510 

Tyr Ala His Lys Gin Glu Pro Pro Gin Tyr Ser His Thr Ser Arg Phe 
515 520 525 

Pro Ser Ala V.et Val Val Thr Asp Thr Ser Ser He Ser Thr Leu Thr 
530 535 540 
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Asn .Vet Ser Ser Ser Lys Gin Cys Fro Leu Gin Ala TrL 

54 5 5S0 



(2) INFORXATION FOR SEQ ID NO : 14 0: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 516 ammo acxds 
(3) TYPE: ammo acid 

(C) STRAXTDEDNESS : 

(D) TOPOLOGY : linear 

(XI) SEQUENCE DESCRIPTION; SEQ ID NO: 140: 

Met Asp Met Ala Asp Tyr Ser Ala Ala Leu Asp Pro Ala Tyr Thr 
Is 10 



1 5 



Leu Glu Phe Glu Asn Val Gin Val Leu Thr Met Gly Asn Gly P>-o Ser 
-0 25 30 

Ser Pro His Cys Leu Thr Val Ala Leu Leu Gly Ala Trp His Ser Asp 
35 40 45 

Met Met He Leu Leu Pro Leu Arg Leu Ala Arg Leu Arg His Pro Leu 

■ 55 60 

Arg His His Trp Ser He Ser Gly Gly Val Asp Ser Ser Pro G^n Glv 
65 70 75 80' 

Asp Thr Ser Pro Ser Glu Gly Thr Asn Leu Asn Ala Pro Asn Ser Leu 
85 90 95 

Gly Val Ser Ala L^u Cys Ala He Cys Gly Asp Arg Ala Thr Gly Lys 
100 105 110 

His Tyr Gly Ala Ser Ser Cys Asp Gly Cys Lys Gly Phe Phe Arg Arg 

120 125 

Ser Val Arg Lys Asn His Met Tyr Ser Cys Arg Phe Ser Arg Gin Cys 

135 140 

Val Val Asp Lys Asp Lys Arg Asn Gin Cys Arg Tyr Cys Arg Leu Lys 

150 155 160 

Lys Cys Phe Arg Ala Gly Met Lys Lys Glu Ala Val Gin Asn Glu Arg 
165 170 175 

Asp Arg He Ser Thr Arg Arg Ser Ser Tyr Glu Asp Ser Ser Leu Phe 
IQO 185 190 

Ser He Asn Ala Leu Leu Gin Ala Glu Val Leu Ser Arg Gin He Thr 

200 205 

Ser Pro Val Ser Gly He Asn Gly Asp He Arg Ala Lys Lys He Ala 
21C 215 220 
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Ser lie Ala Asp Val Cys Giu Ser Met Lys Glu Gin Leu Leu Vai Leu 
225 230 235 24C 

Val Glu Trp Ala Lys Tyr He Pro Ala Phe Cys Glu Leu Pro Leu Asp 
245 250 255 

Asp Gin Val Ala Leu Leu Arg Ala His Ala Gly Glu His Leu Leu Leu 
260 265 270 

Gly Ala Thr Lys Arg Ser Met Val Phe Lys Asp Val Leu Leu Leu Gly 
275 280 285 

Asn Asp Tyr He Val Pro Arg His Cys Pro Glu Leu Ala Glu Met Ser 
29C 295 300 

Arg Val Ser He Arg lie Leu Asp Glu Leu Val Leu Pro Phe Gin Glu 
305 310 315 320 

Leu Gin He Asp Asp Asn Glu Tyr Ala Tyr Leu Lys Ala He He Pne 

325 330 335 

Pne Asp Pro Asp Ala Lys Gly Leu Ser Asp Pro Gly Lys He Lys Arg 
340 345 350 

Leu Arg Ser Gin Val Gin Val Ser Leu Glu Asp Tyr He Asn Asp Arg 
355 360 365 

Gin Tyr Asp Ser Arg Gly Arg Phe Gly Glu Leu Leu Leu Leu Leu Pro 
370 375 380 

Thr Leu Glu Ser He Thr Trp Gin Met He Glu Gin He Gin Phe He 
385 390 395 400 

Lys Leu Phe Gly Met Ala Lys He Asp Asn Leu Leu Gin Glu Met Leu 
405 410 415 

Leu Gly Gly Ser Pro Ser Asp Ala Pro Hxs Ala His His Pro Leu His 
420 425 430 

Pro His Leu Met Gin Glu His Met Gly Thr Asn Val He Val Ala Asn 
435 440 445 

Thr Met Pro Thr His Leu Ser Asn Gly Gin Met Cys Glu Trp Pro Arg 
450 455 460 

Pro Arg Gly Gin Ala Ala Thr Pro Glu Thr Pro Gin Pro Ser Pro Pro 
465 470 475 480 

Gly Gly Ser Gly Ser Glu Pro Tyr Lys Leu Leu Pro Gly Ala Val Ala 
485 490 495 

Thr He Val Lys Pro Leu Ser Ala He Pro Gin Pro Thr He Thr Lys 

500 505 510 



Gin Glu Val He 

51S 
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';2- INFOPJ-IATION FOP. SE^' KO : 141 : 

(1; SEQUENCE CHAKACTER I STI CS . 

(A) LENGTH. 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 
GCGGGACCG3 ATCAGCA 



(2) INFORMATION FOR SEQ ID NO: 142: 

ix) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142 

Arg Asp Arg lie Ser 

1 5 



(2) INFORKATION FOR SEQ ID NO : 14 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 17 base pairs 
(3) TYPE: nucleic acid 

(C) STRAJ^JDEDNESS : single 

(D) TOPOLOGY ; linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 143: 
GCGGGACTGG ATCAGCA 



(2) INFORMATION FOR SEQ ID NO: 144: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

( D ) TOPOLOGY : 1 inear 

(XI ) SEQUENCE DESCRIPTION: SEQ ID NO; 144: 

Ala Glu Val Leu Ser Arg Gin 

1 5 



(2) INFORMATION FOR SEQ ID NO: 14 5: 
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(il SEQ'JENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D; TOPOLOGY: linear 

(ix) FEATUHE : 

(A) NAME/KEY: modi f i ed_base 

(B) LOCATION; 16 

(D) OTHER INFORMATION : /not e= "N - C or T" 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO; 14 5: 

GCGGAGGTCC TGTCCNGACA GGTACCGGGG 



(2) INFORMATION FOR SEQ ID NO: 14 6: 

(i) SEQUENCE CHAPJJ^CTERISTICS : 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(.D) TOPOLOGY: linear 

(ix) FEATURE: 

{A} NAME/KEY; modi f i ed_base 
(B) LOCATION: 8 

{Dj OTHER INFORMATION : /note= "N = C or T" 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 146: 
AAAGCAANGA GAGAT 



12) INFORMATION FOR SEQ ID NO: 147: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 4 amino acids 
CB) TYPE: ammo acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY; linear 

(ix) FEATURE : 

(A) NAME/KEY: Modif ied - si te 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /note= "X = R or any amino acid" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 7: 

Lys Gin Xaa Glu 
1 
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CLAIMS 

A method for screening for diabetes comprising: 

a) obtaining sample nucleic acid from an animal; and 

b> analyzing the nucleic acids to detect a mutation in an HNF encoding nucleic segment; 
wherein a mutation in the HWF encoding nucleic acid is indicative of a propensity for non insulin 
dependent diabetes. 

The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF la-encoding nucleic 



2. 
acid. 



3. The method of claim 1. wherein the HNF 1 a encoding nucleic acid is located on human 
10 chromosome 12q. 

4. The method of claim 2, wherein the HNFla encoding nucleic acid is located at the M0DY3 locus. 

5. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic 
acid. 

6. The method of claim 5, wherein the HNF4a encoding nucleic acid is located on human 
15 chromosome 2D. 

7. The method of claim 5, wherein the HNF4a-encDding nucleic acid is located at the M0DY1 locus. 

8. The method of claim 1, wherein the HNF encoding nucleic acid is an HNFip-encoding nucleic, 
acid. 

9. The method of claim 8, wherein the HNF4a-encoding nucleic acid is located at the M0DY4 locus. 
20 ID- The method of claim 1, wherein the nucleic acid is DNA. 

11. The method of claim 1 , wherein the step of analyzing the HNF-encoding nucleic acid comprises 
sequencing of the HNF-encoding nucleic acid to obtain a sequence. 

12. The method of claim 1 1, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of an HNF gene. 

25 13. The method of claim 1 2, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of HNF la. 

14. The method of claim 1 3, wherein the native nucleic acid sequence of HNF 1 a has a sequence set 
forth in SEQ ID NO: 2. 
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15. The method of claim 12. wherein the sequence of the HNF encodmg nucleic acid is compared to a 
native nucleic acid sequence of HI\IF4a. 

16. The method of claim 15, wherem the native nucleic acid sequence of HNF4a has a sequence set 
forth in SEQIDN0:78. 

17. The method of claim 12, wherein the sequence of the HNF encoding nucleic acid is compared to a 
native nucleic acid sequence of HNFip. 

18. The method of claim 17, wherein the native nucleic acid sequence of HNFip has a sequence set 
forth in SEQ ID NQ:90. 

19. The method of claim 1, wherein the HNF-encoding nucleic acid comprises at least one point 
mutation. 

20. The method of claim 1, wherein the HNF-encoding nucleic acid has a translocation mutation. 

21. The method of claim 1, wherein the HNF-encoding nucleic acid has a deletion mutation. 

22. The method of claim 1, wherein the HNF-encoding nucleic acid has a insertion mutation. 

23. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF la-encoding nucleic acid 
and a mutation occurs in exon 2, exon 4. exon 6, or exon 9 of the HNFla-encoding nucleic acid. 

24. The method of claim 1, wherein a mutation occurs at codon 131, 142, 159, 171, 289, 291, 292, 
273, 379, 401, 447, 547, or 548 of an HNFla-encoding nucleic acid having the sequence of SEQ ID 
N0:1. 

25. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNFla-encoding nucleic acid 
and a mutation occurs at the splice acceptor of intron 5 or intron 9. 

26. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNFla-encoding nucleic acid 
and a mutation is a mutation defined in Table 8. 

27. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid 
and a mutation occurs in exon 7 of the HNF4a-encoding nucleic acid. 

28. The method of claim 1, wherein a mutation occurs at codon 268, 130 or 273 of an HNF4a- 
encoding nucleic acid having the sequence of SEQ ID N0:78. 

29. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF4a-encoding nucleic acid 
and a mutation is a mutation defined in Table 10. 
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3D. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNF1 p-encoding nucleic acid 
and a mutation occurs in exon 2, exon 7 or intron 8 of the HNFIp-encodmg nucleic acid. 

31. The method of claim 1, wherein a mutation occurs at codon 177, 463, at nucleotides 48 of intron 
8, or at nucleotide 22 of intron 8 of an HNFip-encoding nucleic acid having the sequence of SEQ ID 
ND:90. 

32. The method of claim 1, wherein the HNF-encoding nucleic acid is an HNFl p-encoding nucleic acid 
and a mutation is a mutation defined in Table 15. 

33. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises 
PCR. 

34. The method of claim 1, wherein the step of analyzing the HNF encodmg nucleic acid comprises 
use of an RNase protection assay. 

35. The method of claim 1, wherein the step of analyzing the HNF-encoding nucleic acid comprises an 
RFLP procedure. 

36. A method of regulating diabetes in an animal comprising the step of modulating HNF function in 
the animal. 

37. The method of claim 36, further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNFIa-encoding nucleic acid sequence for a mutation. 

38. The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNF la polypeptide to the animal. 

39. The method of claim 38, wherein the HNFl a polypeptide is a native HNFl a polypeptide. 

40. The method of claim 39, wherein the native HNFl a polypeptide has the sequence of SEQ ID NO: 
2. 

41. The method of claim 38, wherein the provision of an HNFl a polypeptide is accomplished by 
inducing expression of an HNFl a polypeptide. 

42. The method of claim 41, wherein the expression of an HNFla polypeptide encoded in the 
animal's genome is induced. 

43. The method of claim 41, wherein the expression of an HNFla polypeptide encoded by a nucleic 
acid provided to the animal is induced. 
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44. The method of claim 38, wherein the provision of an HNFla polypeptide is accomplished by a 
method comprising introduction of an HNF la-encoding nucleic acid to the animal. 

45. The method of claim 38, wherein the provision of an HNFla polypeptide is accomplished by 
injecting the HNFla polypeptide into the animal. 

46. The method of claim 36, wherein the step of modulating HNF function in the animal comprises 
providing a modulator of HNFla function to the animal. 

47. The method of claim 46, wherein the modulator of HNF 1 a function is an agonist of HNFla. 

48. The method of claim 46, wherein the modulator of HNFla function modulates transcription of an 
HNF la-encoding nucleic acid. 

49. The method of claim 46, wherein the modulator of HNF1 a function modulates translation of an 
HNFla encoding nucleic acid. 

50. The method of claim 36, further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNF4a-encoding nucleic acid sequence for a mutation. 

51 . The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNF4a polypeptide to the animal. 

52. The method of claim 51, wherein the HNF4a polypeptide is a native HNF4a polypeptide. 

53. The method of claim 51 , wherein the native HNF4a polypeptide has the sequence of SEQ ID 
N0:79. 

54. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by 
inducing expression of an HNF4a polypeptide. 

55. The method of claim 54, wherein the expression of an HNF4a polypeptide encoded in the 
animal's genome is induced. 

56. The method of claim 54, wherein the expression of an HNF4a polypeptide encoded by a nucleic 
acid provided to the animal is induced. 

57. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by a 
method comprising introduction of an HNF4a encoding nucleic acid to the animal. 

58. The method of claim 51, wherein the provision of an HNF4a polypeptide is accomplished by 
injecting the HNF4a polypeptide into the animal. 
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59. The method of claim 36, wherein the step of modulating HNF function m the animal comprises 
providing a modulator of HNF4a function to the animal. 

60. The method of claim 59, wherein the modulator of HNF4a function is an agonist of HrJF4a. 

61 . The method of claim 59. wherein the modulator of tfl\IF4a function modulates transcription of an 
HNF4a-encoding nucleic acid. 

62. The method of claim 59, wherein the modulator of HNF4a function modulates translation of an 
HNF4a encoding nucleic acid. 

63. The method of claim 36, further comprising the step of diagnosing an animal with diabetes via 
analysis of an HNF1 p-encoding nucleic acid sequence for a mutation. 

64. The method of claim 36, wherein the step of modulating HNF function comprises providing an 
HNFip polypeptide to the animal. 

65. The method of claim 64, wherein the Hf\iF 1 p polypeptide is a native HNF 1 p polypeptide. 

66. The method of claim 65, wherein the native HNFip polypeptide has the sequence of SEQ ID 
N0:91. 

67. The method of claim 64, wherein the provision of an HWF1 p polypeptide is accomplished by 
inducing expression of an HNFip polypeptide. 

68. The method of claim 67, wherein the expression of an HNF 1 p polypeptide encoded in the animal's 
genome is induced. 

69. The method of claim 67, wherein the expression of an HNF 1 p polypeptide encoded by a nucleic 
20 acid provided to the animal is induced. 

70. The method of claim 65. wherein the provision of an HNF1 p polypeptide is accomplished by a 
method comprising introduction of an HNFip encoding nucleic acid to the animal. 

71. The method of claim 65, wherein the provision of an HNFip polypeptide is accomplished by 
injecting the HNFip polypeptide into the animal. 

72. The method of claim 36, wherein the step of modulating HNF function in the animal comprises 
providing a modulator of HNFip function to the animal. 

73. The method of claim 72, wherein the modulator of HNFl p function is an agonist of HNF1 p. 

74. The method of claim 72, wherein the modulator of HNFip function modulates transcription of an 
HNFip encoding nucleic acid. 
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75. The method of claim 11, wherein the modulator of HNF1 p function modulates translation of an 
HNFip-encoding nucleic acid. 

76. A method of screening for modulators of HNF function comprising the steps of: 

a) obtaining an HNF polypeptide; 

b) determining a standard activity profile of the HNF polypeptide; 

c) contacting the HNF polypeptide with a putative modulator; and 

d) assaying for a change in the standard activity profile. 

77. The method of claim 76, wherein the HNF polypeptide is an HNF la polypeptide. 

78. The method of claim 77, wherein the standard activity profile of the HNF la polypeptide is 
determined by measuring the binding of the HNF1 a polypeptide to a nucleic acid segment comprising the 
sequence of SEQ ID NO: 9. 

79. The method of claim 78, wherein the nucleic acid segment comprising the sequence of SEQ ID 
NO: 2 comprises a detectable label. 

80. The method of claim 77, wherein the HNFla polypeptide comprises a detectable label. 

81. The method of claim 77, wherein the standard activity profile of the HNFla polypeptide is 
determined by determining the ability of the HNFla polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID NO: 1. 

82. The method of claim 76, wherein the HNF polypeptide is an HNF4a polypeptide. 

83. The method of claim 82, wherein the standard activity profile of the HNF4a polypeptide is 
determined by measuring the binding of the HNF4a polypeptide to an amino acid segment comprising the 
sequence of SEQIDN0:B5. 

84. The method of claim 83, wherein the nucleic acid segment comprising the sequence of SEQ ID 
NO: 1 comprises a detectable label. 

85. The method of claim 82, wherein the HNF4a polypeptide comprises a detectable label. 

86. The method of claim 82, wherein the standard activity profile of the HNF4a polypeptide is 
determined by determining the ability of the HNF4a polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID N0:78. 
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87. The method of claim 76, wherein the HNF polypeptide is an HPJFl^ polypeptide. 

88. The method of claim 89, wherein the HNFip polypeptide comprises a detectable label. 

89. The method of claim 88, wherein the standard activity profile of the HNFlp polypeptide is 
determined by determining the ability of the HNFip polypeptide to stimulate transcription of a reporter 
gene, the reporter gene operatively positioned under control of a nucleic acid segment comprising the 
sequence of SEQ ID N0:128. 

90. A method of screening for modulators of HNF function comprising the steps of: 

a) obtaining an HNF-encoding nucleic acid segment; 

b) detenmining a standard transcription and translation activity of the HNF nucleic acid 

sequence; 

c) contacting the HNF-encoding nucleic acid segment with a putative modulator; 

d) maintaining the nucleic acid segment and putative modulator under conditions that 

normally allow for HNF transcription and translation; and 

e) assaying for a change in the transcription and translation activity. 

91. An HNF modulator prepared by a process comprising screening for modulators of HNF function 
comprising: 

a) obtaining an HNF polypeptide; 

b) determining a standard activity profile of the HNF polypeptide; 

c) contacting the HNF polypeptide with a putative modulator; and 

d) assaying for a change in the standard activity profile. 

92. An HNF modulator prepared by a process comprising screening for modulators of HNF function 
comprising: 

a) obtaining an HNF-encoding nucleic acid segment; 

b) determining a standard transcription and translation activity of the HNF nucleic acid 

sequence; 

c) contacting the HNF-encoding nucleic acid segment with a putative modulator; 

d) maintaining the nucleic acid segment and putative modulator under conditions that 

normally allow for HNF transcription and translation; and 

e) assaying for a change in the transcription and translation activity. 



98V254A 



wo 98/11254 PCT/US97/16037 

261 

93. An isolated and purified polynucleotide having an HNF la-encoding nucleic acid sequence. 

94. The polynucleotide of claim 93, wherein the HNFIa encoded has an amino acid sequence as set 
forthinSEQIDNO:127. 

95. The polynucleotide of claim 93, wherein the HNFla-encoding nucleic acid sequence has a 
sequence of SEQ ID N0:126. 

96. An isolated and purified polynucleotide having an HNFI P-encoding nucleic acid sequence. 

97. The polynucleotide of claim 96, wherein the HNFI p encoded has an amino acid sequence as set 
forth in SEQIDN0:139. 

98. The polynucleotide of claim 96, wherein the HNFI P-encoding nucleic acid sequence has a 
sequence of SEQ ID N0:128. 

99. An isolated and purified nucleic acid segment comprising 1 5 contiguous nucleic acids identical to 
the sequence of SEQ ID N0:128 or SEQ ID NO: 126. 

TOO. The isolated and purified nucleic acid segment of claim 99, wherein said segment encodes a full- 
length HMF polypeptide. 

101. The isolated and purified nucleic acid segment of claim 100, wherein said segment encodes a 
promoter for the expression of an HNF polypeptide. 



wo 98/11254 



PCTyUS97/ 16037 




wo 98/11254 



PCTAJS97/16037 




(i/ytu) 3soon"io 



wo 98/11254 



PCTAtS97/16037 




(-|/|ouJd) NlinSNI 



wo 98/11254 



PCT/US97/16037 




ENSDOC!D ^WO 98' 12&4A1_I_> 




BNSOOCID <WC Se-:2iiA: 



o o o 

mow 

m cvi 



(uiLU/ioLud) ysi 



wo 98/11254 



PCT/US97/16037 




wo 98/11254 I PCT/US97/16037 

I ': 0 7 




wo 98/11254 



PCTAJS97/I6C37 



A. Edinburgh Pedigree 

01 




FIG. 5A 




EA7 EA8 4961 



Teat eai 






NN 



EA9 £A10 EA11 



NN NM NN NN NM NM 



Frameshift mutation, Insertfon of C In codon 289. Exon 4; CCC-»CCCC 



B. H Pedigree 



1 







1=B 



NM NN NM 



Ml 



IV 




NM 





NN 




NM NN 



NM 



NN 



FIG. 5B 



4^ 

NM NM 



Missence mutation, codon 131, Exon 2; CGG (Arg)--4CAG (Gin) 



wo 98/11254 



C. P Pedigree 




11 



III 




IV 



■t<5 

NM I NN 




NM 



NM 



PCT/US97/16037 



F/G. 5C 



NM [ N N 

(5* 



NM 



NM NN 

Splicing mutation • splice acceptor site of Intron 5; AG-#GG 



NM 




NN 



D. GK Pedigree 



II 



III o 

NN 

IV ^ B 




NM 



NM 



NIM 



NM 



|2 


3 
















[ w 






NM 


NM 


NN 




NN 



NM 



NN 



i 



NM 




NN 



NM 



NN 



O 

NM 



FIG. 5D 




Splicing mutation - splice donor site of Intron 9; GT-»AT 
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FIG. 8A. Partial Sequence of Human HNF4 Gene 
(Exon 1 SEQ ID NO:34) 

GCAGAGAGGG CACTGGGAGG AGGCAGTGGG AGGGCCrArr 
GCCGCrrr^l TCGGGGTGGG CGCCCAGGGt'aGGGC^^^^^^^^ 
GCCGCGGCGT GGAGGCAGGG AGAATGCGAC TCTCGAAAAC 
rTnnn''^''^^ ATGGACATGG CCGACTACAG TGCTGCAC^ 
GACCCAGCCT ACACCACCCT GGAATTTGAG AATGTGCAGG 

ISI'L^^^^AT GGGCAATGGT AGGTGGGGGC AGATGTGCCC 
^r^lr^""^ GTGGGGGCAG GTGTGCCTGG GTCCAGGAGC 
AGATCTTTGG CACTCAACTT TGGGGTGGGA GGAGAATGAT 
GC^AlrrrT ^^^^^^^^C TACAGGCCAG CACAGGTG^ 
GCoAAGTGAA GCCCATGTGC CCAGGCACAG TGATCACAGG 

r^ll^Jr^-r^J^^ GAAGGGAGGC CTGCAAGGGC CAATTTCCAG 
CAAAAGTCGA TCCCGGCTAT TCCTCCCAGG CCCTTCCArV 
CCTCACTGCC TCACAGTGGC TCTGCTTGGC GCT^^^^^^^^ 
CGATTrlnnn ^^^^^GCTCC CCCTTGGTGC CCAGCTCCAG 
CGATTCAGCC CAGCACGGCC CCTTCGTGAA CCCCTTGGGC 

Glrrrr^rT? ^^^^^''^'^^^ AGGGATGTTG TATCCCTGGA 
GATGGTGGTT GGAGACATAA CCGCATTTCT C 
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FIG. 8B. Partial Sequence of Human HNF4 gene 
(Exon 1 b SEQ ID NO:36) 



TGGATGTTTG TACATGTGTG CTGTGTGTGC GGGTCATAGA 
GCACATGTGT TTGTGCATGC GGACCTGTTG GAGTGCCCT€ 
TTCTTCCTGC ATCTTTATCC TGTATGGGCG TTTTGTCGTG 
TGCCCATATT TGTACCTGCT GTGTATATAT GCAGTTCCCT 
GTGCTGCGGG CGGGGGTCAG CGGTCTCTGG TGTGCACGAC 

TGCACAGACC CAAATGCAGG ACTCTGTTGT TGCCACTCAC 
CAAGTGAGAT TCATATCAGC AACATGTCCG TTTGTCTCTG 
AGCAGATTTG TTGCCGCTGC GTCTCGCCAG ATTGAGGCAT 
CCCCTCCGAC ATCACTGGAG CATATCTGGA GGGGTGGACA 
GT7CTCCACA GGGAGGTAGG GGAAAAGAGG AGGCCCGGAA 

ACCCCTCCTG GAGGGAAGAG CCCCATCGGT CCCAGGCCAG 
CCTCAGAGGA GAGGGQGCAG GCAGCTGGCT GAGGTCAGCC 
TYGCCACCCTG CTTCCTTCTG TGTCTTGGAG CCACTCAGCC 
AGTATGAGGC TGCAGCTCCA GCTGAGGTCT GGAATCTTGT 
GGTCAGCTCA GCTAGGGTGA GGAGGCAGCT GCTGGGCACT 

GCTTGTTGTC AGCTCAGCAG GTGCTCACCT GCCCCTGCCG 
TCCAGTCACG TGTGACCTTG GGCATGTCAC CTCCCCTATC 
CTGGCTTCTG TATCTTCTAC AAAACAGGCT TCATTCCCCC 
AGGCCTGCTG GCTGGACGGC TTTTAGGCCT GTCTGAGGAC 
CACGCCAGGA GCGCAAGGCA AAAACACACC AGAGAT 
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FIG. 8C. Partial Sequence of Human HNF4 Gene 
(Exon 2 SEQ ID NO:38) 



CCCCTTGCGA GTTAGGAGGC CGGCTCCCAC CCCAGAAGGT 
GGCCAGGTTT TCATGCCTTC CTAGAGAAAG CTGGGGCTGC 
TGGCCTCCAC CACAGGGAGA CGCAGACCCT CAGAAACAAG 
TCTGTGAAGT CACAACCAGC CCCAGTTTAC AGATGTGAAA 
CTGAAGCTCC AAAAAGTCAG GAGGTCACTG AGTGGGGAGG 

TGATGGAGTG GAACAGCCCC CAGATCTGGC TGAGGCCGAA 
GCCCTGGAGA GATCCCCGCA AGGCTCCCTT AGATGCCTGA 
CATTCTGTTC TTCCTGAAGC CTCACTCCCT TCTCTCCTGG 
CGCAGACACG TCCCCATCAG AAGGCACCAA CCTCAACGCG 
CCCAACAGCC TGGGTGTCAG CGCCCTGTGT GCCATCTGCG 

GGGACCGGGC CACGGGCAAA CACTACGGTG CCTCGAGCTG 
TGACGGCTGC AAGGGCTTCT TCCGGAGGAG CGTGCGGAAG 
AACCACATGT ACTCCTGCAG GTGAGGAGCC TCAATTJCTT 
CAGCTGGGAA ATGGGCACAC TTGGGCTCAT GGCCCCAAGG 
TCTGTCTTCT CCCTGAGTGG GTAGGTCCCA GAGACAGCTG 

CCCTTCAGGG CCTTCAAGGC TCCTTCTGGTT TTGT 
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FIG. 8D. Partial Sequence of Human HNF4 Gene 
(Exon 3, SEQ ID N0:40) 

AGAGAGTTCA TAGCACCTTT CCAGCTCCTG GTGGGTTCAA 
GAGAGAACTC CCGGGATGAA GAGATGAGAG CACTGAGGTT 
GGGGGGTCAA CTGGATAGCC AGGGCCCTAG TTCTGTCCTA 
AGAGGAGGAA GTTGTGTCTT CTCCATCCAA CCATCCAAAAG 
ACCTCCCCAG ATTTAGCCGG CAGTGCGTGG TGGACAAAGA 

CAAGAGGAAC CAGTGCCGCT ACTGCAGGCT CAAGAAATGC 
TTCCGGGCTG GCATGAAGAA GGAAGGTGAG CCTCGGCCCT 
CCCCGCCCCA CCACCACTGC ACCACCTGCA CCCACAGCTC 
CCCGACAGTC ATTTACAACT GTAGCCACAC TTTATGACTC 
AGTGGCAGGC CCCAGGGTGA CTGGCTAATG GCTGAGAAGA 

GGGAGGGCCT GGAAATCTGA CCATAGGGAG CGGCTGGGCT 
TGGTCTTGAG AAAGATTC 
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FIG. 8E. Partial Sequence of Human HNF4 Gene 
(Exon 4 SEQ ID N0:42) 

tcccactcct catcagtcac agacaccccc accccctact 

ccatccctgt tctccctcct cacctctctg tgcctcctca 

cagCCGTCCA GAATGAGCGG GACCGGATCA GCACTGGAAG 

GTCAAGCTAT GAGGACAGCA GCCTGCCCTC CATCAATGCG 

CTCCTGCAGG CGGAGGTCCT GTCCCGACAG GTACCGGGGT 

GATCCTGCCA CCCACCCAGG GGATCCCCCA CACTACAGAG 
GAGCTCACCT CCTCCACCTC CATTCTCCCC AGCCAGGCCC 
TGGAGCAGCT GACGGGAGGG GCCTCAGATA TTACAGAAGG 
GACACTGAGT GCGGTTTCAC ATGGCCCAGT TTGCAGCAAG 
GGCAGGAATC GAACCTGGCG CCCTGGGGCA CTTTCTAATT 

CATCCTACTG CCTGCATCCC ACAGGCCAAG CAGAGTCTTC 
ACCTTCACTG AGGGCCTGCG ATCAGCTCAG CTCCGAGAGA 
ACAGAGCAGT GGCTCAGTGG AGAGAGGTGG CAAAGTGGGG 
CCCAGCCCTT CCCTTGCTGA GTGACCTTGG GCAAGTCACA 
GCACCTCTCT GAGCCATGGT TGCCTCATTG TCAGAAAAGG 

ATGATGATTT TTTGCCTGC TTCTCCTCTA AGGCTGACAG 
ACTCCTTGGG GCTCTAAAGC TG 
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FIG. 8F. Partial Sequence of Human HNF4 Gene 
(Exon 5, SEQ ID N0:44) 



TTCTCCTCA TCCCTGCCTC CTCCCTCCCT CCGTTTTTAC 
CCTGAGCTTC CTTCAGAGCT GGAGGGCACC CACTATCCAG 
CCCCCTCCCC ACATCTGATT CCAGGGAGGG GGCTCTGTGC 
AGGGGACAGA GAATGCGGGA GGGCCCGGAC ATCTCCAGCA 
TTTTCTTCCC TGTATCTCTC GAAGATCACC TCCCCCGTCT 

CCGGGATCAA CGGCGACATT CGGGCGAAGA AGATTGCCAG 
CATCGCAGAT GTGTGTGAGT CCATGAAGGA GCAGCTGCTG 
GTTCTCGTTG AGTGGGCCAA GTACATCCCA GCTTTCTGCG 
AGCTCCCCCT GGACGACCAG GTGAGGATGG GCGTGGATGG 
TGGGCAGTAG TGGGCAGTGG GCGGGGCAGC CAGGGGGCTG 

CTGGCCCACC TGGGATATAG CCGTGGACTG GCTTGATTTT 
ATTTTATTTA ACAAAATATG TAGTGCACAC ACGTGTCTGA 
AACTTTAAAT CACCTTACAA ATATTAACTC AGTTAGCTCC 
TCCAACAACT CTATGAGGTA GGTACTAAGG TACT ATT ATT 
ACTGCCATCT CATAGGTGAG AGATTGGGGC ACAGAGAGGT 

TAAGTAACCT GCTCAAGGTC ACATAGCTAC TATCCAGCAT 
AGCTGGG 
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FIG. 8G. Partial Sequence of Human HNF4 Gene 
(Exon 6, SEQ ID N0:46) 



ATTTTTACAA AGCACCCTTC ATAATTCTCC ATAGCTGGTC 
CATGGGTGGG AATTTGGGAC CCACAGTTTT GGAACTTTTT 
GGGATCATAG ACCTTTTTGA GAATCTCAAA AAAGAAAAAA 
AAGCACACAG AATGTTGCTT ACAGTTTCAT CAGGCACACA 
GAAGAGGCCC AGCACGAAGC AGTTTCTTGC CCAAGGACAC 

AGCAGTTCAA GGACAGAGTC AGCGCGAGGT CTCTCAGCTC 
TGAGCACATG TTCTTTCCCC TTCCAGGTTT CTAGTTTTAT 
GGGTAGTAGT TTTATGATGC CCATTTCACA GTTCAGGCAG 
GTAGAGGCAG AGGGGAGCAT TAAGCTGACT TGCCCAGCGT 
CACTGAGTTG GCTACGGGCA GCCTTCCCAA GGGTACAGAT 

GGCAAACACT GTTCCTTATC TCTTTCAGGT GGCCCTGCTC 
AGAGCCCATG CTGGCGAGCA CCTGCTGCTC GGAGCCACCA 
AGAGATCCAT GGTGTTCAAG GACGTGCTGC TCCTAGGTGA 
GGCGGCTGCC TGCCCTGGCC AGGGCTCCAG GGAGGGTATG 
CCTAGCATGG CACTCACCCA GGCAAGGAGA TTCACATGGT 

GGCATGCAAG GGTGAGGGAG ACTAGTCAGG AGTGGCCCTG 
TCCTCAGGCT TGCATTGGAG GGCTCCAGGA CTCAGTTTTC 
AACTGGGTAC CCCACTCAGA TGCAAGGAAA TGTGGATGCA 
AGTCACCAAA TTCCCAGCAT TGAAGTCAGA GCACGATCAG 
GGTTATCCCT GGAATTACCT GTGCATCCTT TTTTCTTTTG 

ACAGAGTCTT GCTCTGTCAC TCAGGCTGGA GTGCAATGAT 

GTGA 
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FIG. 8H. Partial Sequence of Human Hi'\IF4 Gene 
(exon 7, SEQ ID N0:48) 

GCAACACTAG TATTTTAATA TAACAATGCT ATGAGGGAGC 
TCGATTATTT ATCCTCATCT TATAGATAAG AAAACTGAGG 
CACAGAGAGG TTAAGTAACT TATCCAACTA TAACCAGCTA 
TCAGGGGCAG AGCCATTTAA GCAGGGCAGT GCAGTTCCAG 
AATCTGGTCC TTTAACCTTG ATGCTTTGGT GCCTATCAGG 

TGACCTTTGA ATGTCATCGA TCTTGTGAGT CATGTTGGTA 
AATGGAGCTT GGGTCATGTG AAAGAGGTCC TAGAAAGCCA 
AGTTCCAAGC TCAGCCGGAT GACTCAAGGC AGCTTATCTT 
CTGAATCTGG GCCTCAGCTT CCTTACCTGT GAAATGGGAG 
TCACCATCCC TGCAGGTCCT CCTCCCACAG GCACCAGCTA 

TCTTGCCAAC TTAAAAGCCA AAACTAGAGG AGAGGGGTCA 
ACCCAAAGTG ACTTCCCATC CTCCCTCCCT CCCAACCCTT 
CCAGGCAATG ACTACATTGT CCCTCGGCAC TGCCCGGAGC 
TGGCGGAGAT GAGCCGGGTG TCCATACGCA TCCTTGACGA 
GCTGGTGCTG CCCTTCCAGG AGCTGCAGAT CGATGACAAT 

GAGTATGCCT ACCTCAAAGC CATCATCTTC TTTGACCCAG 
GTACAGTGCA CACCTCCTAA-GCCATCCCTG ACTCTCTCTC 
CAGAACGCTC TGCCAGACTT CTCCTATTGG GTTCTGTACA 
CTGAGTTCAC AGCCTCATCT CATGTTAACG ACAGCCAGGA 
GAGGCCGTTT TCATTTAACA GATGAGGCAA GTCAAGATTT 

GAAGAGACAA TATGGCCGGG CGCAGTGGCT CACACCTGTA 
ATCCCATCAC TTTGGGAGGC TGAGGCGGGC GGATCACCTG 
AGGTCAGGGG TCAAGATGAG CCTGGCTAAC ATGGAGAAAC 
CCCATCTCTA CTTAAAA 



wo 98/1 1254 



PCT/US97/ 16037 



FIG. 81. Partial Sequence of Human HNF4 Gene 
(Exon 8 SEQ ID N0:50) 



GTGGCTCTGC CAACAACTGG CTGTGCGACC CAGGACAAGT 
CCTATCTTTG CACTGTGTCT GGGTTTCCCC GTGTGTAAGA- 
TGAGGCGGTT GCTAGGTGCT TATTGGATGC ATTCCTCAAG 
TCCCGCCCTC CATCTCCTAT TCCCCTCTCT TCTGGTTTAG 
TGCTTTAGGA AATGTGGCAG AAATC i 1 I I i CTGCCTGTGT 

CTAGGAAATC ATAATTCATG CTGGCGTACC CTGGTTGTTG 
AGGTCCCTGA ATCCTTGTGC CCACACTGCT GAAGACTCCT 
TGTGTGACAC AAGTCAGGGG ACATCTGGGT CTTGACTCCC 
CAGATGCTCC AGGTGGACCC TGCTGCCCTC CCTTGCCCAC 
CCTCTTCCAT TGTAGATGCC AAGGGGCTGA GCGATCCAGG 

GAAGATCAAG CGGCTGCGTT CCCAGGTGCA GGTGAGCTTG 
GAGGACTACA TCAACGACCG CCAGTATGAC TCGCGTGGCC 
GCTTTGGAGA GCTGCTGCTG CTGCTGCCCA CCTTGCAGAG 
CATCACGTGG CAGATGATCG AGCAGATCCA GTTCATCAAG 
CTCTTCGGCA TGGCCAAGAT TGACAACCTG TTGGAGGAGA 

TGCTGCTGGG AGGTCCGTGC CAAGCCCAGG AGGGGCGGGG 
TTGGATTGGG GACTCCCCAG GAGACAGGCC TCACACAGTG 
AGCTCACCCC TCAGCTCCTT GGCTTCCCCA CTGTGCCGCT 
TTGGGCAAGT TGCTTAACCT GTCTGTGCCT CAGTTTCCTC 
ACCAGAAAAA TGGGAACAAG GCAATGGTCT ATTTGTTCAG 

GCACCGAGAA CCTAGCACGT GCCAGTCACT GTTCTAAGTG 
CTGGCAATTC AGCAAAGAAC AAGATCTTTG CCCTCGGGGA 
GGCTGTGTGT GTGTGATAT GTATGGATGC GTGGATATCT 
GTGTATATGC CGGTATGTGC GTGCATGTGT ATATAAAGCC 
TCACATTTTA TGATTTTGA 
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FIG. 8J. Partial Sequence of Human HNF4 Gene 
(exon 9, SEQ ID NO:52) 



GGGACACATA GATGCTATAA GTAGGTCAGT TGGCTGCAGC 
AGAGATGTGG GGGATGAGGC TGAAAGGTGA GGCGGGAGCA 
AATGGTTGAA GGACTTGCAC TCCAAGGAGC TTTGAGAGCC 
ATTGATTACA TCCATTATGT TACTATGTGA CCAATACATT 
ACTCATTAGA ACATTTACGT GATCTCAGAG CTTCCTTATA 

TGCACCTTGT TCCTTTCAAC TCACTTTTGT TCTCTTGGTT 
TTTTGGGGTC CTCTTAACAC CCTCATGAAG TCTATAGATG 
GGAATGGTAC ACCCTAGTTT ACTAACCCAG GAATAGGTAC 
CCAACAGGCA CTGCCAATAT TGGATGGGCT GGTTGATTGG 
CCACGCCTGA GGAAGATGGC GTCCCAAGGC CTGAGGTCTG 

CATCCCAGAC TCTCCATCCT GATCGACCTT CTCTACCTGC 
AGGGTCCCCC AGCGATGCAC CCCATGCCCA CCACCCCCTG 
CACCCTCACC TGATGCAGGA ACATATGGGA ACCAACGTCA 
TCGTTGCCAA CACAATGCCC ACTCACCTCA GCAACGGACA 
GATGTGTGAG TGGCCCCGAC CCAGGGGACA GGCAGGTGGG 

CAAACTCTGG GATTTTACCT TGCAAAGGGT GAGGATGGGG 
CTTAAGACAG GAGGCAGGAG AAAGTGGAGT CTAGAAGGTA 
GAACCAGGAT GCAACAGTTT TCTGGGTTCC AGGGTAGGGA 
ATAAAGGGCA AGATTGTCCA TTTGTTGAGG CTGTTTATTC 
AGTAAGGTGA CTGACAGCCT TTACTGAATG AAGCCATTGT 

TGGGATGAGG CAATCCACTG GATGAGGTAA CCCATTGGGT 
GAAGATGTCT TGGGTGAGAA TTCCATTAGT TGACATTGTC 
CATTAAGTAA AAGTGGTCAT TGAAGTAAGG CTGCACAGTT 
GGGTAAGGCT ATCCATTAGA CATTAGATGA GACTACCCAT 
TGGGTCAGGA TGTCTGCTGG GCTA 
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FIG. 8K. Partial Sequence of Human HNF4 Gene 
(Exon 10 SEQ ID NO:54) 



TTTGGGAGAA GCAGTCCAAG TCTGCATATC AAATAAATGA 
TGGAGGAGAT GGGTGGTAGG ACCTTCCAGA CCTCATAAAA 
CTTAGGCTTT ATGATCTGGG ACTCACAGAA GGTTGAGCAA 
TAAAAGACCT TAGGGATTAT CTGGCTTAAT TAATTCTCTC 
ATTTTATAGA GGAAGAAATT AAGTCAAGGT GGGGCAGGGT 

GGGAGGGGAG AACTTTCCCG GGGCTCTTCA TTTACTCCCA 
CAAAGGCTGG AATTTTGAGC AGCCCCTGTC TGTCTGTTTG 
TCCTTCCAGC CACCCCTGAG ACCCCACAGC CCTCACCGCG 
AGGTGGCTCA GGGTCTGAGC CCTATAAGCT CCTGCCGGGA 
GCCGTCGCCA CAATCGTCAA GCCCCTCTCT GCCATCCCCC 

AGCCGACCAT CACCAAGCAG GAAGTTATCT AGCAAGCCGC 
TGGGGCTTGG GGGCTCCACT GGCTCCCCCC AGCCCCCTAA 
GAGAGCACCT GGTGATCACG TGGTCACGGC AAAGGAAGAC 
GTGATGCCAG GACCAGTCCC AGAGCAGGAA TGGGAAGGAT 
GAAGGGCCCG AGAACATGGC CTAAGGCACA TCCCACTGCA 

CCCTGACGCC CTGCTCTGAT AACAAGACTT TGACTTGGGG 
AGACCCTCTA CTGCCTTGGA CAACTTTCTC ATGTTGAAGC 
CACTGCCTTC ACCTTCACCT TCATCCATGT CCAACCCCCG 
ACTTCATCCC AAAGGACAGC CGCCTGGAGA TGACTTGAGC 
CTTACTTAAA CCCAGCTCCC TTCTTCCCTA GCCTGGTGCT 

TCTCCTCTCC TAGCCCCGGT CATGGTGTCC AGACAGAGCC 
CTGTGAGGCT GGGTCCAATT GTGGCACTTG GGGCACCTTG 
CTCCTCCTTC TGCTGCTGCC CCCACCTCTG CTGCCTCCCT 
CTGCTGTCAC CTTGCTCAGC CATCCCGTCT TCTCCAACAC 
CACCTCTACA GAGGCCAAGG AGGCCTTGGA AACGATTCCC 

CCAGTCATTC TGGGAACATG TTGTAAGCAC TGACTGGGAC 
CAGGCACCAG GCAGGGTCTA GAAGGCTGTG GTGAGGGAAG 
ACGCCTTTCT CCTCCAACCC AAC 
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